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INTRODUCTION, AREAS OF EXPERTISE AND SUMMARY OF OPINIONS 
Q: Please state your name. 

A: My name is Janet Wittes. 

Q: What is your occupation? 

A: I am a statistician dealing primarily with biostatistics. 

Q: Where are you employed? 

A: I am president of a consulting firm called Statistics Collaborative Inc. in Washington, 

D.C. 

Q: Have you agreed to serve as an expert witness for the defendants in this case? 

A: Yes, I have. 

Q: What is the scope of the work that you performed in this case? 

A: I have reviewed two papers by Farrelly et al. that purport to evaluate the effectiveness of 

American Legacy Foundation’s “truth” campaign. I have looked at the scientific validity of the 
studies, their statistical soundness, and the scope of permissible statistical inferences that one 
may draw from them. In conjunction with my review of those studies. I have also performed 
some data analyses. 

Q: What are those two papers? 

A: They are 

• JD- 065578, Farrelly MC. Healton CG, Davis KC, Messeri P. Hersey JC, Haviland 
ML. Getting to the truth: evaluating national tobacco countermarketing campaigns. 
AJPH 92:901-907, 2002, and 

• JD-025252, Farrelly MC, Davis KC, Haviland L, Messeri P, Healton CG. Evidence of 
a dose response relationship between “truth” antismoking ads and youth smoking 
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1 prevalence. AJPH 95:425-431, 2005. 

2 Q: May we refer to the first of those papers as “Farrelly 2002” or “the 2002 paper” and 

3 the second as “Farrelly 2005” or “the 2005 paper?” 

4 A: Yes. 

5 Q: How did you go about your review? 

6 A: I started as if I were a peer reviewer, noting only what I saw in the paper. In that process, 

7 I identified some methodological questions that the papers did not address. I had the luxury - 

8 which an ordinary peer review would lack - of reviewing other materials germane to the issues I 

9 identified. 

10 Q: Briefly, what was the nature of those two papers? 

11 A: Farrelly 2002 analyzed the responses that 12-17 year-olds provided to questions in two 

12 cross-sectional surveys about attitudes concerning smoking and the tobacco industry and the 

13 respondents’ intent to smoke in the future. Farrelly 2005 also used repeated cross-sectional 

14 surveys, in this case to investigate changes in the prevalence of smoking associated with 

15 exposure to the “truth” campaign. 

16 Q: What did the authors of those two studies conclude? 

17 A: The authors of Farrelly 2002 concluded: “Whereas exposure to the 'truth’ campaign 

18 positively changed youths’ attitudes toward tobacco, the Philip Morris campaign had a 

19 counterproductive influence.” The authors of Farrelly 2005 concluded: “The study showed that 

20 the campaign was associated w 7 ith substantial declines in youth smoking and has accelerated 

21 recent declines in youth smoking prevalence.” 

22 Q: Do you agree with those conclusions? 

23 A: No, I don’t. 
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1 Q: Briefly, what are your conclusions about those papers? 

2 A: Neither study has an architecture that is well suited for the process of causal inference. 

3 Both studies have serious flaws and limitations that compromise their validity. I have concluded 

4 that the data from neither paper are sufficient to draw causal inferences or come to conclusions 

5 about the effectiveness of the “truth” campaign. In addition, I have concluded that the data from 

6 the 2002 paper are insufficient to draw conclusions about the effectiveness of the “Think. Don’t 

7 Smoke” campaign. 

8 Q: Before I ask you to explain those opinions in detail, I would like to ask you about 

9 your background and qualifications as a statistician. Is JD-025230 a copy of your 

10 curriculum vitae? 

11 A: Yes, it is. 

12 Q: Would you describe your education for the Court? 

13 A: As an undergraduate I attended Radcliffe College, from which I received an A.B. in 

14 Mathematics in 1964. I then attended graduate school in statistics at Harvard University, earning 

15 my M.A. in 1965 and my Ph.D. in 1970. 

16 It took me a bit longer to complete my Ph.D. than usual because during that same period 

17 of time I gave birth to my first two children and was raising them as I was going to school. 

18 Q: What was the subject of your Ph.D. dissertation? 

19 A: The title was “Estimation of Population Size: the Bernoulli Census.” It dealt with 

20 “capture-recapture” methods in epidemiology. 

21 Q: Did any publications arise out of your dissertation? 

22 A: Yes. I published four papers directly from my dissertation (Wittes, Sidel 1968; Wittes 

23 1972; Wittes 1974; and Wittes, Colton and Sidel 1974) and one based on the methodology I 
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1 developed (Goldberg and Wittes 1981). I also published a paper in Biometrics that came directly 

2 from my dissertation and was an application of the methods (Goldberg and Wittes 1978). In 

3 addition, I published a book chapter that was based on my dissertation. The book was Statistics 

4 by Example: Finding Models, edited by Frederick Mosteller and others (1973). Professor 

5 Mosteller was the chairman of the Department of Statistics at Harvard when I was a graduate 

6 student. That book was part of a project of the Joint Committee of the American Statistical 

7 Association and the National Council of Teachers of Mathematics on the Curriculum in Statistics 

8 and Probability for Grades K-12. It was an effort to educate students about the value and 

9 importance of statistics. 

10 Q: Could you briefly summarize your professional career after receiving your Ph.D.? 

11 A: Yes. I have served in academia, in the government, and for the past dozen years, I have 

12 directed my own biostatistical consulting firm. 

13 Q: What was your first job? 

14 A: My first job, and one that had a profound influence on me, was as research assistant for 

15 Jerry Cornfield, one of the most outstanding biostatisticians this country has ever had. He was 

16 assistant chief and then chief of the Biometry Section at the National Cancer Institute from the 

17 late 1940’s until the 1960’s. That was the group that did some of the earliest work on the link 

18 between cigarette smoking and lung cancer. Mr. Cornfield also was instrumental in developing 

19 the case-control study, one of the major tools in epidemiology, and he was the first biostatistician 

20 to prove how an odds ratio could provide a relatively unbiased estimate of relative risk, which 

21 was a development that greatly aided the utility of the case-control study design. And I consider 

22 him to be one of my mentors. During my two years with him, we worked on a grant that 

23 involved trying to determine how to make inferences about subgroups when a treatment appears 
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1 to differ by subgroup. 

2 Q: What was the next stage of your career? 

3 A: I spent a few years as a part-time adjunct assistant professor in the Division of 

4 Epidemiology at the Columbia University School of Public Health while juggling kids and 

5 career. Then I taught statistics for eight years in the Department of Mathematical Science at 

6 Hunter College of the City University of New York, where I started as an Assistant Professor 

7 and later became a tenured Associate Professor. 

8 Q: When did you leave Hunter College? 

9 A: In 1982. I was offered the position as Chief of the Biostatistics Research Branch of the 

10 National Heart, Lung, and Blood Institute (NHLBI), which is part of NIH. My husband, wdio is 

11 an oncologist, agreed to move to Washington, where he took a position at the National Cancer 

12 Institute (NCI). 

13 Q: What were your responsibilities at NHLBI? 

14 A: I was Chief of the Biometrics Research Branch for the Institute, which meant that I was 

15 the chief biostatistician. I supervised a group of about eight biostatisticians. I assigned the 

16 projects, did trouble shooting, and was called in when problems arose, for example, a “safety 

17 signal,” an indication of harm from a supposedly safe drug. In addition, I did research in 

18 biostatistics. 

19 Q: What type of research did the Branch do? 

20 A: NHLBI did several kinds of studies. The first were lab studies and small clinical studies 

21 conducted entirely within NHLBI. The Branch served as the biostatisticians on these studies. 

22 Another type of study consisted of large, sometimes very large, randomized clinical dials on 

23 prevention and treatment that were performed by contractors whom we oversaw, such as a study 
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1 on hypertension in the elderly, a study on diagnosing chr onic obstructive pulmonary disease, 

2 early AIDS studies, and others. My third significant area of work involved biostatistical research 

3 methods. 

4 Q: How long did you have that position? 

5 A: I held that position until 1989 when my husband and I moved to New Haven. For the 

6 year we were in New Haven, I was a part-time biostatistician at the Veterans Administration 

7 Medical Center in West Haven, CT. 

8 Q: What did you do after you left New Haven and the VA Medical Center? 

9 A: My husband and I moved back to Washington in 1990. He returned to NCI, and I began 

10 getting phone calls asking me to assist on various projects. By the next year, I had my own 

11 company, Statistics Collaborative, with one employee besides myself. 

12 Q: How many employees do you have now? 

13 A: We have a staff of 23, of whom 11 are biostatisticians. 

14 Q: What type of work does Statistics Collaborative do? 

15 A: We work with researchers in government, industry, non-profit, and academic settings to 

16 provide statistical collaboration in the fields of clinical trials and epidemiology. 

17 Q: What types of clients do you serve? 

18 A: We work for funis in the pharmaceutical, medical device and biotechnology industries, as 

19 well as for the federal government and non-profit organizations. 

20 Q: What kind of work do you do for them? 

21 A: We do a wide variety of work on clinical trials, especially controlled trials, 

22 epidemiological studies, and other research, including designing, monitoring, analyzing and 

23 interpreting studies. Most of our work relates to studies of medical prevention and treatment, for 
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1 example, reviewing a study design or deciding whether an “unsafety” signal in a study is real. 

2 Another typical situation is reanalysis of data when a study yields an unexpected result. I am 

3 often called in to see what the problem is. I might look over the design of the study, review or 

4 rewrite the statistical analysis plan, and critique the study. Sometimes I reanalyze the data. 

5 Q: You mentioned controlled clinical trials; what are they? 

6 A: A controlled clinical trial is a study that evaluates the effectiveness and safety of a 

7 medication, medical device, or other treatment by monitoring its effects on groups of people. At 

8 least one of the groups is a control group; the others are experimental groups. The trials we work 

9 on typically are randomized, meaning the participants are assigned by chance alone to different 

10 treatments or to a group that receives the control therapy. In many dials, the control therapy is a 

11 placebo. 

12 Q: Would you provide examples of some of your work in designing, analyzing and 

13 interpreting studies? 

14 A: Yes. 

15 • For a consortium of non-profit organizations and the United States Army, we are an 

16 integral part of the study team for clinical trials developing malaria vaccines. As a 

17 result of that work, members of my staff and I have co-authored a number of peer- 

18 reviewed scientific papers. 

19 • For small biotechnology companies, we have helped develop Phase I studies. Phase 

20 II/III studies, and other clinical trials for orphan drugs, which are drugs used to treat 

21 rare diseases. As a result of that work, we co-authored peer-reviewed scientific 

22 papers with the investigators and have helped drugs get approved by the United States 

23 Food and Dmg Administration and foreign regulatory agencies. 
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• For a large biotechnology company, we are responsible for the statistical design of 


2 pre-clinical and Phase I studies for cancer drugs. 

3 • For several cardiovascular device companies, we have provided the statistical design, 

4 protocol development, and analysis of device studies. The devices have included 

5 defibrillators, stents, and artificial heart valves. 

6 • For large pharmaceutical companies, we review protocols, analytical plans and 

7 submissions to the FDA. We have also provided statistical review of major drug 

8 development programs. 

9 Q: Would you provide examples of the work you have done in monitoring randomized 

10 clinical trials? 

11 A: Yes. One of our particular strengths is our experience in the statistical data monitoring of 

12 multi-center randomized clinical trials. 

13 • For a large biotechnology company developing treatment based on monoclonal 

14 antibody research, we served as the interim monitoring center for a study of an 

15 infection in children called meningococcemia. We were responsible for 

16 randomization, monitoring safety, convening the Data and Safety Monitoring 

17 Committee, and performing analyses of the data for submission to the FDA. 

18 • For a large pharmaceutical company, we have performed the interim data monitoring 

19 for studies of eye disease. 

20 • For a large pharmaceutical company, we performed the interim safety and efficacy 

21 monitoring for several cardiovascular studies. 

22 Q: What do you do for non-profit organizations? 

23 A: We provide the same kind of consultation that we do for our other clients. For example, 
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1 for a non-profit organization developing a new research program, we served as the statistical 

2 center for the development of clinical studies of HIV and for an observational database study of 

3 HIV disease. 

4 Q: What have you done for the government? 

5 A: I interact with the government in a number of ways. For NIH and the Army, Statistical 

6 Collaborative provides statistical support. We help design, analyze, and interpret research 

7 studies. In addition, I get involved in the review process for NIH grants and contracts. For the 

8 FDA, I have served on official Advisory Committees that provide advice to FDA scientists 

9 concerning approvability of drugs, biologies and devices. I also serve, or have served, on many 

10 Data Safety Monitoring Boards for various institutes at the NIH. I chair several of these 

11 committees. 

12 Q: Please tell the court about the statistical support that you provide for government 

13 studies. 

14 A: For example: 

15 • We have served as the statistical consulting group for the research studies of the 

16 National Institutes of Health Critical Care department. 

17 • For the Walter Reed Army Institute of Research, we are the statistical coordinating 

18 center for malaria vaccine studies. 

19 • For the USDA, we have provided analyses of sensory studies to help determine how 

20 to cook meats for school lunches. 

21 • For the Department of Veterans Affairs, we consult on studies conducted through the 

22 Cooperative Studies Program Coordinating Centers. 

23 Q: You mentioned that you have served on FDA Advisory Committees; what is an FDA 
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1 Advisory Committee? 

2 A: It is a committee formed to provide advice to FDA scientists concerning approval of 

3 drugs, biologies and devices. Regular members become Special Government Employees and 

4 serve for a term of four years. 

5 Q: What FDA Advisory Committees have you served on? 

6 A: I was a member of the FDA’s Circulatory System Devices Panel from 1999-2003 and 

7 have been a member of several ad hoc FDA advisory panels in various disease areas. 

8 Q: You also mentioned that you have served on Data Safety Monitoring Boards; what 

9 are they? 

10 A: Data Safety Monitoring Boards, or DSMBs, are groups of experts who review the 

11 conduct, progress and interim results of clinical trials. I have served on these boards for the NIH, 

12 the Veteran’s Administration, and industry. 

13 Q: On what DSMBs have you served? 

14 A: From 1993 through 2003,1 chaired the DSMB of the Women’s Health Initiative Clinical 

15 Trial. I currently chair the DSMBs for the Retinitis Pigmentosa Trial of the National Eye 

16 Institute and the Folic Acid for Vascular Outcome in Transplantation Recipients Trial 

17 (FAVORIT) for the National Institute of Diabetes and Kidney Diseases. In addition, I have been 

18 a member or chair of about three dozen other DSMBs or other scientific advisory committees for 

19 NIH, the VA, industry and non-profit medical groups. 

20 Q: What is the Women’s Health Initiative? 

21 A: It is a major ongoing research program established by NIH to review the efficacy and 

22 safety of various interventions to prevent cardiovascular disease, cancer, and osteoporosis in 

23 post-menopausal women. One intervention was the use of post-menopausal hormones. 
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1 Q: I plan to ask you more about this later, but, briefly, what did the Women’s Health 

2 Initiative find about post-menopausal hormone use? 

3 A: The NIH and many of its advisors - including me - believed that even though 

4 observational studies had linked hormone replacement therapy to reduced incidence of heart 

5 disease and reduced risk factors for heart disease, those studies were inadequate. We argued, in 

6 the face of considerable opposition, that a randomized clinical trial was necessary to decide the 

7 issue. Ultimately, a pair of trials was run under the WHI. They demonstrated that, depending on 

8 their composition, HRTs, far from reducing heart disease, led to either no or little benefit, or 

9 actually caused increases in heart disease. 

10 Q: Are you a member of any professional societies? 

11 A: Yes. I am an elected Fellow of the American Statistical Association and former chair of 

12 its Biometrics Section; a member and elected Fellow of the American Association for the 

13 Advancement of Science; a member and Council member of the International Biometric Society, 

14 for which I have previously served as President of the Eastern North American Region and 

15 Treasurer; current chair of the Policy Committee and former President of the Society for Clinical 

16 Trials; an elected member of the International Statistical Institute; and a Fellow of the Royal 

17 Statistical Society. 

18 Q: You indicated that you are a “Fellow” of some of these organizations. What does it 

19 mean to be a “Fellow?” 

20 A: For some societies, “Fellow” is just another name for “member.” For example, I am 

21 officially a “Fellow” of the Royal Statistical Society, but so is every member. All you have to do 

22 to become a Fellow is pay dues. For other societies, one must be elected to be a Fellow by one’s 

23 peers, and it is considered an honor. For example, the American Statistical Association awards 
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1 the title of Fellow as an honor to members who have made outstanding contributions in some 

2 aspect of statistical work. According to the Association's bylaws, the number of Fellows elected 

3 each year cannot exceed one-third of one percent of the full members. I have been a Fellow 

4 since 1989. 

5 Q: You said that you are a Council member of the International Biometric Society 

6 (IBS) and are a former President of the Eastern North American Region of that 

7 organization; what is the International Biometric Society? 

8 A: Biometrics in this context is another term for statistics in the biological, medical, and 

9 agricultural sciences. The IBS is the leading worldwide organization in the field. The Society's 

10 38 regions and national groups range, alphabetically, from Argentina to Zimbabwe. I belong to 

11 the Eastern North American Region, which we fondly pronounce ee-ncir, the largest region. 

12 Q: You said that you are an elected member of the International Institute of Statistics; 

13 what is that organization? 

14 A: This is an organization of only about 2000 members from 133 countries. Election, based 

15 on recognition as a “definitive leader in the field of statistics,” is required for membership. 

16 Q: Finally, you said you are a former president of the Society for Clinical Trials; would 

17 you describe that organization. 

18 A: The SCT is an international professional organization dedicated to the development and 

19 dissemination of knowledge about the design, conduct, and analysis of government and industry- 

20 sponsored clinical trials and related health care research methodologies. 

21 Q: Have you served on the editorial boards of scientific journals? 

22 A: Yes. From 1993-98, 1 served as Editor-in-Chief of Controlled Clinical Trials, which is 

23 now known as Clinical Trials. This is the official journal of the Society for Clinical Trials. I am 


Written Direct: Janet Wittes, Ph.D., US v. PM, 99-cv-02496 (D.D.C.) (GK) 12 

3133985 


http://legacy.!ibrary.ucsf.eaiiiytioyegt|(l^a0^pclt.industrydocuments. ucsf.edu/docs/nrfl0001 



1 currently on the editorial boards of three journals. Cardiovascular Clinical Trials Forum, 

2 Statistics in Medicine, and Clinical Trials. I have been a peer reviewer for many journals in 

3 statistics and medicine, including Biometrics, Statistics in Medicine, Journal of the American 

4 Statistical Association, Clinical Trials, Science, Journal of the American Medical Association, 

5 American Journal of Public Health, and a number of specialty medical journals. 

6 Q: Have you given presentations to professional, academic and government groups? 

7 A: Yes, I have given approximately 100 invited presentations altogether. In recent years I 

8 have averaged about five presentations a year. 

9 Q: Before what groups have you given presentations? 

10 A: A wide variety of groups. Last year, I gave papers to an FDA conference on Bayesian 

11 methods in Clinical Trials, to the Society of Clinical Trials, to the Joint Statistical Meeting, and 

12 to the Biopharmaceutical Applied Statistics Symposium (BASS XI), including the Keynote 

13 Address. So far this year, I have given two presentations to the meeting of the Eastern North 

14 American Region of the International Biometrics Society and on. May 22, I am scheduled to give 

15 one to the meeting of the Society for Clinical Trials. I also have given talks to such 

16 organizations as the NIH, local chapters of the American Statistical Association, universities in 

17 the U.S. and abroad, and professional societies in medicine. 

18 Q: Have you also written scholarly articles and books? 

19 A: Yes, I have. I have authored or co-authored 19 book chapters and books, published 

20 approximately 100 peer-reviewed articles and written approximately 20 non-peer-reviewed 

21 discussions, letters, and proceedings in professional journals. 

22 Q: Are you the lead author on many of your publications? 

23 A: I am often the lead author on publications regarding statistical methods. However, I am 
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1 not generally the lead author on medical papers or on papers in non-statistical journals. It is 

2 unusual for a statistician to be the lead author on such papers. 

3 Q: Have you continued publishing since you founded your consulting firm? 

4 A: Yes, I have. Since starting Statistics Collaborative in 1991,1 have published more than 

5 75 articles, nearly all of them peer-reviewed. 

6 Q: Why have you continued to publish since going into consulting? 

7 A: I do it for the same reason I participate in NIH reviews and professional societies: I love 

8 that type of scientific work, and it allows me to contribute to the scientific community. I 

9 encourage the other biostatisticians on our staff to do the same thing. 

10 Q: Let me ask you about one publication on your curriculum vitae. You were a co¬ 
ll author of the article, “Cardiovascular Risk Associated with Celecoxib in a Clinical Trial 

12 for Colorectal Adenoma Prevention,” published in the New England Journal of Medicine 

13 on March 17, 2005. Would you tell the court about that publication? 

14 A: Yes. That was a study of a drug in a class of drugs called COX-2 inhibitors used as pain 

15 relievers. The drug, celecoxib, was marketed under the trade name “Celebrex.” Another COX-2 

16 inhibitor, marketed as Vioxx, had been observed to be associated with an increased incidence of 

17 cardiovascular events, and was voluntarily withdrawn from the market. As a result, the DSMBs 

18 of two ongoing trials of celecoxib formed an independent committee composed of cardiologists 

19 and a biostatistician to reassess the cardiovascular safety data for this drug. I was asked to serve 

20 as the biostatistician on that independent committee. 

21 Q: What did your committee find? 

22 A: As our article in the New England Journal of Medicine states, we found that “Celecoxib 

23 use was associated with a dose-related increase in the composite end point of death from 
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1 cardiovascular causes, myocardial infarction, stroke, or heart failure” and that “these data 

2 provide further evidence that the use of COX-2 inhibitors may increase the risk of serious 

3 cardiovascular events.” 

4 Q: You have described work that you have done on randomized clinical trials; does 

5 most of your work relate to that kind of study rather than observational studies? 

6 A: Yes. 

7 Q: Earlier, you said that the Farrelly studies that you reviewed were observational 

8 studies; do you review observational studies in your professional work? 

9 A: Yes, I review observational studies frequently in my professional work. Often, as in the 

10 case of the HRT and celecoxib (Celebrex) literature, observational studies precede the 

11 randomized clinical trials. It is often problems or deficiencies in the observational studies that 

12 point up the need for clinical trials. And in designing the clinical trial, we have to evaluate the 

13 observational studies thoroughly. That, for example, is precisely what we did in the case of the 

14 hormone therapy studies on the Women’s Health Initiative when we argued that randomized 

15 trials were necessary. I have also been involved in the design and analysis of large observational 

16 studies. 

17 Q: Let’s discuss studies having to do with behavioral endpoints. On your curriculum 

18 vitae, you state that you have been a member of the faculty of the NIH Summer Institute on 

19 Design and Conduct of Clinical Trials Involving Behavioral Interventions; would you 

20 describe that NIH program? 

21 A: NIH was concerned about the quality of the applications for grants in behavioral 

22 medicine. So they set up an intensive 2-week summer course to train behavioral scientists to 

23 design studies that would assess causal links in behavioral outcomes. I was a member of the 
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1 faculty for the second and third years of the program in 2002 and 2003, and I will be on the 

2 faculty again this year. 

3 Q: What behavioral outcomes were being analyzed in the studies underlying those 

4 grant applications? 

5 A: Among others, spousal abuse, tobacco cessation, smoking initiation, alcoholism, and 

6 psychosocial issues related to chronic illness. 

7 Q: What areas did NIH invite you to teach? 

8 A: I taught about study architectures, including the importance of randomization in the 

9 assessment of causation, and the various types of pitfalls and problems with observational 

10 studies, as well as data monitoring and handling missing data. 

11 Q: Are you a behavioral medicine specialist? 

12 A: No. 

13 Q: Why were you asked to teach behavioral scientists about study architectures and the 

14 problems to avoid in designing and running behavioral studies? 

15 A: The principles of statistics and biostatistics apply independently of the variables being 

16 studied. Jerry Cornfield used to say that when he was first faced with a statistical problem he 

17 would change all the nouns to letters, like “A,” “B,” and “C.” His point, with which I agree, was 

18 that the principles of statistics and statistical inference are universal, regardless of the specific 

19 subject. 

20 Q: Before I ask you about the Farrelly studies, I want to ask about one more aspect of 

21 your experience: Have you ever testified before? 

22 A: I have never testified in a trial before. I have given one deposition as an expert witness 

23 for a manufacturer of power tools. That was about eight years ago. 
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1 Q: Aside from the case in which you gave the deposition, have you ever been retained 

2 as an expert witness before this case? 

3 A: No. 

4 Q: Have you ever consulted in litigation aside from that case and this one? 

5 A: No. 

6 Q: Why did you agree to do it this time; are you a fan of the tobacco industry? 

7 A: Not at all. In fact, I was very reluctant to accept the engagement because it was the 

8 tobacco industry that was asking. 

9 Q: Why did you agree to do it then? 

10 A: The basic reason is that I care about good science and am distressed by bad science. I 

11 reviewed these two studies. It was clear that they were poorly done and did not provide support 

12 for the claims the authors made about them. The scientific community and the public at large are 

13 badly served by flawed studies. In this case, it is important to know how best to prevent kids 

14 from smoking. But when you have a paper that purports to show a method to do that and the 

15 paper can’t support the contention, to adopt that method would be a waste of an opportunity and, 

16 perhaps, even counterproductive. 

17 Q: These studies were published in peer-reviewed journals; isn’t that a sufficient 

18 guarantee of reliability? 

19 A: Unfortunately, it is not. Studies with serious scientific flaws can get by the peer-review 

20 process. Studies with statistical flaws - some serious - are common in scientific journals. The 

21 problem with peer review of statistical methods is not unique to any specific area of research. 

22 Peer review is not an adequate scientific assurance that the conclusions the authors have reached 

23 are reliable or based on sound statistical methods. 
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FARRELLY (2002) 


1 

2 Summary of Study 

3 Q: Let’s discuss some specifics about the Farrelly (2002) study. Would you generally 

4 describe the study design? 

5 A: Yes. The study incorporated data from two cross-sectional point prevalence sample 

6 surveys that were conducted over the telephone on behalf of ALF. Those surveys were part of 

7 the Legacy Media Tracking Survey, or LMTS. The first survey wave, called LMTS-I, was 

8 conducted in late 1999, before the “truth” campaign had begun, but after the start of another 

9 youth anti-smoking campaign sponsored by Philip Morris called “Think. Don’t Smoke.” The 

10 second wave, LMTS-IF was conducted using a different survey sample, in the Fall of 2000. 

11 Among other things, the surveys asked respondents many questions about their attitudes toward 

12 smoking and tobacco companies, their awareness of antismoking campaigns, and their intentions 

13 about smoking. 

14 Study Architecture. Bias, Causal Inference 

15 Q: Dr. Wittes, where does the process begin in evaluating whether a study or group of 

16 studies may point to a causal relationship? 

17 A: The logical starting point is the study hypothesis, and the study design or architecture. 

18 The design of a study is important in evaluating the contribution it may make to evaluating a 

19 proposed causal relationship. 

20 Q: Why is study design important? 

21 A: The investigation of causation requires the ability to isolate whether a specific variable of 

22 interest has a particular impact or causes a particular change. And in order to do that in the real 

23 world, the investigator needs study architectures and study methods that can validly assess that 
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1 relationship. Although a variety of research designs are available to evaluate possible 

2 relationships between variables, some are better suited to provide information that is relevant to 

3 causal inference than others. 

4 Q: With that in mind, how should investigators choose study designs? 

5 A: Study designs should meet the objective of the study. For instance, if the investigator is 

6 interested in studying the prevalence of a variable, a well-designed cross-sectional study is 

7 generally appropriate. If the investigator is interested in generating hypotheses for further 

8 evaluation, observational studies can be useful. If causal inference is the goal, however, the 

9 challenges put upon the study architecture are quite stringent. 

10 Q: In general, what is the “big picture” goal in designing a study that is best suited to 

11 causal inference? 

12 A: The goal in any study that will assist in causal inference is to compare people who are 

13 alike except for the particular variable being studied. 

14 Q: Referring you to page 19 of the 2004 Surgeon General's Report (US 88621), what 

15 does this report say about the issue? 

16 A: It states: “In a laboratory, scientists are able to predict, fairly confidently, the outcome in 

17 this counterfactual state by repeating an experimental procedure with every important factor 

18 tightly controlled, varying only the factor of interest. But in observational studies of humans, 

19 scientists must try to infer what the outcome would be in a counterfactual state by studying 

20 another group of persons who, at least on average, are substantively different in only one 

21 relevant variable, the exposure under study. The outcome in this second group is used to 

22 represent what would have occurred in the original group if it had been observed with a different 

23 exposure, as in its counterfactual state.” 
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1 Q: Do you agree with that? 

2 A: Yes, it isn’t precisely the language I would use, but I think it is a helpful illustration of 

3 the challenges in an observational study. 

4 Q: Are observational studies then at the top of the list in terms of study architectures 

5 that will assist in causal inference? 

6 A: No, that wasn’t the point the Surgeon General was making with the reference to 

7 observational studies. The point is that to draw causal inferences, investigators must somehow 

8 create a situation where people are “substantively different in only ... the exposure under study.” 

9 That is a tall order for an observational study. 

10 Q: Is there a hierarchy of study architectures that are most likely to assist in the 

11 process? 

12 A: Yes, and I have included a hierarchy often cited for clinical studies in the table below: 

Tabic IIL Hierarchy of strength of evidence concerning efficacy 

of treatment 


1. Anecdotal case reports 

2. Case series without controls 

3. Scries with literature controls 

4. Analyses using computer databases 
5, Case-control* observational studies 
6. Series based on historical control groups 
?, Single randomized controlled clinical trials 
8. Confirmed randomized controlled clinical trials 

13 

14 Q: Are randomized controlled trials generally considered the preferred design to assess 

15 causation? 
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1 A: Yes. Causal inference requires that the investigator be able to drill down on “the one 

2 substantively different relevant variable.” Because randomization yields the greatest likelihood 

3 that we are able to make that sort of assessment, a randomized controlled trial - with unbiased 

4 assessment of outcome and a statistical methodology that preserves the randomization - is the 

5 preferred study architecture. Using that type of study design, the investigator can directly control 

6 crucial variables and randomly assign subjects into study groups to make the groups as alike as 

7 possible except for the exposure. In contrast, investigators who use observational study 

8 architectures do not experimentally manipulate study variables. They observe the study 

9 environment in its undisturbed state, and must take the differences among groups as they find 

10 them. One of the strengths of randomization is that it produces groups balanced on both 

11 measured and unmeasured variables. By their nature, observational studies can deal only with 

12 measured, i.e., observed, variables. Because of these important differences, much stronger 

13 causal inferences can be made on the basis of well-designed experiments or randomized 

14 controlled trials than on observational studies. Many authorities consider experimental studies 

15 the “gold standard” for causal inference. 

16 Q: How does study architecture affect the various study methods an investigator must 

17 use if causal inference is the goal? 

18 A: Although randomization does not guarantee protection from bias, when you have data 

19 from a randomized study - with, as I said before, unbiased assessment of outcome and valid 

20 statistical methodology - in general, the challenges put upon the statistical methods to control for 

21 bias tend to be less severe. This protection comes from having controls for bias in the design 

22 stage. But without randomization, and therefore with essentially no control over critical 

23 variables afforded by the design, investigators simply observe and measure naturally occurring 
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1 events, such as the experience of the “cases” in case-control studies, or the different incidence of 

2 disease in the exposed and unexposed cohorts. They must then rely on statistical methods to 

3 control for influences that distort the comparison between groups, because the validity of the 

4 comparison depends upon how alike or well-matched the groups are, except for the focal 

5 variable. Most statistical methods for control used in observational studies, however, come with 

6 the “cost” of strong mathematical assumptions. For example, we must assume that the model 

7 includes all the variables that influence the relationship and that we know how to adjust for them 

8 in a way that removes bias. 

9 Q: Is that why observational studies can be less helpful in causal inference? 

10 A: Yes, at least partly. Another problem is that some observational studies may have 

11 problems in time sequencing - the exposure may come after the outcome. 

12 Q: What creates the challenges observational studies must meet in order to control for 

13 bias? 

14 A: The inherent variability of human beings and human behavior makes it difficult in 

15 observational studies to identify and then control variables in a way that will permit us to be 

16 confident that the groups being compared are alike except for the focal variable. 

17 Q: Is there a relationship between study design and the degree of caution one should 

18 use in assessing a particular study’s contribution to the body of evidence relevant to causal 

19 inference? 

20 A: In any study that is proposed to be used for etiologic inquiry, both the investigators and 

21 anyone interpreting the results of a study must be concerned with conclusions or associations that 

22 may be due to problems created by the study design and study methods, and which therefore 

23 prevent a “true” or valid measurement of the hypothesized effect. 
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Q: Is the estimated magnitude of the association a factor in causal inference? 

A: Yes. The magnitude or estimated size of the hypothesized relationship is important in 

assessing a possible etiologic relationship, particularly in observational studies, where we can 
almost never completely eliminate bias. In general, in assessing a possible etiologic relationship, 
the weaker the estimated association, the more the scientist analyzing the data must be concerned 
about bias, and the more skeptical the scientist should be about the validity of the relationship. 

Q: Would you explain that? 

A: Yes. Let us imagine we are considering strength of association as measured by an odds 

ratio. The closer the odds ratio is to 1.0, the more likely it is that the result was affected by an 
unknown confounding factor or source of bias that led to the apparent association. Thus, many 
biostatisticians recommend disregarding small odds ratios that arise in observational studies. For 
example, Jerry Cornfield used to say he dismissed observational odds ratios under 2 and he used 
to tell me he was very skeptical of those between 2 and 3. 

Q: Did you know Nathan Mantel? 

A: Yes. 

Q: Did he have an opinion on how large a magnitude of association offered some 

protection against bias in observational studies? 

A: Yes, he did. 

Q: What is JD-025237, “A Conversation with Nathan Mantel"’ in Statistical Science? 

A: Yes. It was one of a series of interviews with NIH" s pioneering biostatisticians. 

Q: Did Dr. Mantel describe his view about small odds ratios in this interview? 

A: Yes, he did. 

Q: What did he say? 
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On page 92 he said: 


1 A: 


2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 Q: 


Another interesting fact is that there is always the chance that some confounding 
factor that you did not consider led to an observed association. I think in the 
Mantel-Haenszel paper we had required that you should get a relative risk of at 
least 1.5 as a protection against any of these confounding factors. I think 
somewhere or other Cornfield had mentioned a relative risk of 2.1 have been 
reviewing data on the passive smoking lung cancer relationship. A metaanalysis 
of many epidemiologic investigations yields a relative risk of 1.2 with a lower 
limit on this at 1.04. That 1.04 could very easily be due to other confounding 
factors as, for instance, the fact that some of the women who claimed to be 
nonsmokers or never to have been smokers, actually were current or even ex¬ 
smokers. However, the advocates of this hypothesis could not think of any factor 
which should have been taken into account. My opposite position is that there 
could be a factor that you don’t even know about. That was the point of the 1.5 
relative risk requirement, to cover unanticipated factors. 

What is your opinion on this issue? 


18 A: These mles of thumb are very approximate. One has to judge an observed odds ratio in 


19 terms of what one knows about the underlying science. In general, I usually dismiss new 


20 observational findings if the odds ratio is less than about 1.5, believe them if the odds ratio is 


21 above, say, 10, and have a subjective sliding scale in between. But remember, all of this 

22 discussion pertains to association. We can have a whoppingly high odds ratio without a causal 


23 relationship at all. 


24 Q: Let me ask about bias. What are some of the important forms of bias in 

25 observational studies? 


26 A: Bias can creep in on little cat feet or leap in on big panther feet. It comes in many forms. 

27 One important form, selection bias, is a group of problems that occurs when subjects are selected 

28 in a way that skews or accentuates the differences between the groups being compared. 

29 Whenever the selection differs on characteristics other than the focal variable, there is the 

30 potential for selection bias. 

31 Q: Is selection bias an important issue in the studies you have evaluated? 
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1 A: Selection bias is an important concern in both Farrelly studies. In the Farrelly 2002 

2 study, I worry about selection bias because of the very high nonresponse rate. Often, people who 

3 are asked to participate in telephone surveys will not, either because the investigators are unable 

4 to make contact with them, or because they refuse to participate. That creates a significant 

5 problem because people who are unwilling or unable to respond to surveys, particularly where 

6 there are extensive efforts to follow up, are often quite different than those who are willing to 

7 respond. In the Farrelly 2005 study, selection bias could have arisen because of nonreponse or 

8 because of absence from school. 

9 Q: Are there other important biases that are relevant to your opinion? 

10 A: Yes, those that occur when the “best’' information to assess the relationship has not been 

11 obtained or when the data are not correctly measured. 

12 Q: Could you provide an example? 

13 A: The study investigators may have selected a “proxy” variable as a substitute for 

14 something they were unable or chose not to measure. For instance, in the Farrelly 2005 study, 

15 the investigators chose a measure called “gross ratings points” as a substitute for what they stated 

16 they were modeling: “varied exposure to the ‘truth’ campaign over time.” 

17 Q: I have asked some general questions about observational studies. Where do the two 

18 Farrelly studies fit in with respect to the class of observational study designs? 

19 A: Both are cross-sectional surveys. These types of investigations, if properly conducted, 

20 can allow assessment of associations between variables, and are particularly useful in assessing 

21 prevalence. But because they generally assess “point prevalence,” meaning observations that 

22 apply only at one point in time, they lack the time dimension that is necessary for evaluating 

23 causal relationships. 
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1 Q: The authors of the Farrelly 2005 paper say on page 430 that the cross-sectional 

2 design of the study “weakens the strength of our causal inference.” But then they go on to 

3 suggest that by adding other cross sectional data from earlier years they could bolster the 

4 results. Are they correct? 

5 A: No. You don’t solve the problem of inadequate study architecture by increasing the 

6 sample size. And their suggestion is incorrect that students from different schools in different 

7 cross-sectional surveys taken in earlier years could serve as an unexposed “control group.” In 

8 order to be adequate controls, the students would, at a minimum, have to have provided 

9 responses during the time period of the exposure under study. 

10 Q: How do we know that bias can create a serious problem in observational studies? 

11 A: There are really two parts to that question. First, we know that bias in observational 

12 studies creates real problems because we often see considerable variation in the findings across 

13 similar observational studies. The direction of the proposed effect may even change from study 

14 to study. And even when observational studies have shown generally consistent findings, 

15 sometimes randomized controlled trials have demonstrated that the “consistent” group of 

16 observational studies came to the wrong conclusion. 

17 Q: You mentioned that cross-sectional studies lacked a “time dimension.” Why is that 

18 important? 

19 A: The reason is that even if they show an association, you cannot tell which way the causal 

20 direction runs. It may be that your focal variable causes the outcome variable, but it may instead 

21 be that the outcome variable causes the focal variable, or they both may be caused by something 

22 else. 

23 Q: Do you have specific experience with how biases, uncontrolled confounders and 
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1 weak associations can affect the ability of scientists to reach valid conclusions from 

2 observational studies about the effectiveness of particular treatments or therapies? 

3 A: Yes, I do. Earlier I mentioned that I served as chair of the Data Safety and Monitoring 

4 Board, or DSMB, for the Women’s Health Initiative. A question raised in the early 1990’s was 

5 whether post-menopausal hormone use protected women from heart disease. Several 

6 observational studies have suggested that estrogen and progestin decreased coronary disease risk 

7 by about 50%. At least one randomized study showed that women who took hormones had 

8 better lipid levels and reductions in other risk factors for heart disease. Nevertheless, we 

9 recommended experimental studies to help provide a definitive answer to the question of post- 

10 menopausal hormone use because of the possibility that the observational studies were biased. 

11 The recommendation to perform a randomized controlled trial on hormone replacement therapy 

12 met with substantial criticism from physicians and scientists who believed it was unethical to 

13 conduct randomized clinical trials on a treatment that they thought had been proved to be 

14 effective by the observational data. 

15 Q: What was the basis of the argument that randomized trials would be unethical? 

16 A: To conduct a randomized trial, some of the women would have to receive placebo. The 

17 argument was that you couldn’t ethically administer placebos and thereby deny treatment to 

18 these women because the observational studies demonstrated that hormones were effective. 

19 Q: Then why did NIH and the investigators insist on a randomized study? And why did 

20 the DSMB agree to monitor the study? 

21 A: The NIH, the investigators, and the DSMB did not think the treatment had been proven to 

22 be effective. We didn’t think women should be taking hormones to reduce coronary disease risk 

23 until the proof was there. We thought that the observational studies were subject to biases that 
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1 might have affected their outcomes. 

2 Q: What biases did those of you who favored a randomized trial suggest might have 

3 affected the conclusions that the observational studies reached? 

4 A: We were concerned primarily about selection bias. We thought that it was possible that 

5 the women who voluntarily chose to take estrogen would have been, on average, healthier, better 

6 educated, wealthier, and receiving better medical care than women who didn’t take estrogen. If 

7 so, that could mean that the reduced rate of heart disease that the observational studies found was 

8 not the result of hormone use, but rather a bias created by self-selection to hormone use. In other 

9 words, we knew that the studies couldn't tell us whether hormones made women healthier or, 

10 instead, healthy women were more likely to take hormones than other women. 

11 Q: But hadn’t the observational studies on HRT been repeated? 

12 A: Yes, they had. But if they all suffered from the same biases, it wouldn’t matter how 

13 many studies were conducted; replication would simply repeat the bias. 

14 Q: Didn’t the observational studies attempt to control for factors like income, 

15 education, general health and access to medical care? 

16 A: Yes, they did. They controlled for the confounders that they could identify, but you 

17 never know if there are confounders that you don’t know about. Also, just because you can 

18 identify a confounding factor doesn't mean you can control for it. Sometimes the two groups of 

19 subjects are so different that our methods to control for confounders just don’t work. As I plan 

20 to explain later, that was a problem with Farrelly (2005). 

21 Q: What about the fact that the observational studies said that taking hormones cut the 

22 odds of heart disease in half: isn’t that a large reduction? 

23 A: It might seem to be. In statistical terms cutting a rate in half is the same as a doubling of 
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1 a risk because 0.5 and 2.0 are the inverse of each other. In other words, women who were not 

2 taking hormones had double the risk of heart attacks, according to these studies. But remember 

3 what I said about the magnitude of association. Even though these observational studies showed 

4 a doubling of a risk, we did not have scientific confidence that the association was real, and not 

5 the product of biases or confounders. 

6 Q: Was a randomized clinical trial of post-menopausal hormone use conducted? 

7 A: Yes. 

8 Q: What was the outcome? 

9 A: It showed that estrogen and progestin actually increased the risk of cardiovascular 

10 disease in post-menopausal women. In fact, we had to end the estrogen/progestin study early 

11 because it was clear that the risks from the treatment exceeded any potential benefits. The 

12 conclusions based on the observational data had simply been wrong. 

13 Q: Let me hand you JD-025233, which is a copy of testimony given before the 

14 Subcommittee on Human Rights and Wellness of the Committee on Government Reform of 

15 the Unites States House of Representatives by Barbara Alving, M.D., Acting Director of the 

16 NHLBI. Do you know Dr. Alving? 

17 A: Yes, I do. 

18 Q: Would you read what she said on page 2? 

19 A: She said, “It is worth noting that at the outset of the WHI dial, many interested parties 

20 believed that an outcome favoring estrogen was a foregone conclusion. Indeed, some doctors 

21 and researchers argued that such a trial was unethical because it would require half of the 

22 participating women to take placebos and thereby deny them the presumed benefits of hormones. 

23 Nonetheless, arguments in favor of randomized, placebo-controlled, clinical trials prevailed - 
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1 and, as we now know, they were justified.” 

2 Q: Does that statement accurately describe the claims that were made that the 

3 experimental trial would be unethical because there were observational trials that 

4 appeared to show benefits of post-menopausal use? 

5 A: Yes, it does. 

6 Q: Have there been other instances in which observational studies were later 

7 contradicted by randomized studies? 

8 A: Yes, there have been. Observational data had “shown” that beta-carotene protects against 

9 lung cancer, but the randomized CARET and Finnish cancer studies later showed that it 

10 increases the probability of developing lung cancer. Similarly, observational data had “shown” 

11 that vitamin E prevents heart disease, but the randomized HOPE-TOO study indicates that it 

12 causes an increase in heart failure (though these results have not been replicated). In these 

13 instances, the observational studies were subject to selection biases that not only led to incorrect 

14 results, but, in some cases, to results that pointed to disease protection when the true association 

15 was one of disease causation. 

16 Q: You told us about arguments that conducting randomized trials can be unethical. 

17 Are there ethical considerations that argue in favor of conducting randomized trials of a 

18 treatment even though observational studies appear to show a particular outcome? 

19 A: Yes. If the interest is whether a disease prevention method works, a researcher has an 

20 obligation to conduct the strongest possible study. Otherwise, bias might cause an ineffective 

21 treatment to be accepted or an effective treatment to be rejected. Randomized studies of course 

22 do not always get the right answer; but, if well conducted, their design offers stronger protections 

23 against bias than do other kinds of studies. 
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1 Q: You noted earlier that the Farrelly studies were observational studies. Did the 


2 authors say why they chose an observational rather than randomized study? 

3 A: Yes, they did. They said it would be unethical to conduct a randomized experiment. 

4 Q: Why did they say it would be unethical? 

5 A: They said that they “could not ethically assign some media markets to low or zero 

6 exposure, given the documented successes of the Florida 'truth' campaign.” (JD-025252 at 425.) 

7 They apparently believed that it would be unethical to assign adolescents in certain geographical 

8 areas to low exposure groups because it would deprive them of the benefits of the “truth” 

9 campaign. 

10 Q: Do you agree? 

11 A: No. 

12 Q: Why not? 

13 A: Their statement is contrary to accepted scientific technique. They are assuming that the 

14 “truth” campaign has been proven to be effective before they studied its effectiveness. What 

15 they referred to as “the documented successes of the Florida 'truth' campaign” were two 

16 observational studies that related only to the first two years of the Florida “truth” campaign. 

17 Q: What two studies on the Florida campaign did the authors point to? (JD-065487 and 

18 JD-065861) 

19 A: They are Sly et al., “The Florida ‘truth’ anti-tobacco media evaluation: design, first year 

20 results, and implications for planning future state media evaluations,” Tobacco Control 

21 2001;10:9-15 (JD-065861) and Bauer et al., “Changes in youth cigarette use and intentions 

22 following implementation of a tobacco control program; findings from the Florida Youth 

23 Tobacco Survey, 1998-2000, JAMA, 2000;284:723-728. (JD-065487) 
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1 Q: Dr. Wittes, you have reviewed Dr. Healton’s testimony. Do you recall that she 

2 testified that, “Our campaign is quite different from the Florida ‘truth’ campaign, so we 

3 did adopt some elements from the ‘truth’ campaign, but we also discarded many of them. 

4 We conduct our campaign in a very different way.” (21015:1-4.) 

5 A: Yes. 

6 Q: What does that suggest about whether the results of the Florida “truth” campaign 

7 would make it unethical to conduct a randomized trial of the national “truth” campaign? 

8 A: If the Florida “truth” campaign was not the same as ALF’s national “truth” campaign, it 

9 could not serve as the basis for concluding that the national campaign could only be successful 

10 and that a randomized study would be unethical. 

11 Q: Dr. Healton also testified that “it took a number of years for us to determine 

12 whether it [ALF’s campaign] was or wasn’t effective. I mean, you don’t find that out in the 

13 first six months.” (20832:19-22) What significance does that have in terms of whether it 

14 would have been unethical to do a randomized trial? 

15 A: It means that when ALF began the study, it had not yet reached the conclusion that its 

16 campaign would inevitably be successful. Since that was the case, there was no ethical 

17 prohibition against randomizing different audiences to different exposures. In fact, since ALF 

18 didn’t know that its program would be successful, I believe it was unethical not to conduct a 

19 randomized d ial. If it is feasible, a researcher interested in prevention has an obligation to 

20 conduct the strongest possible study. Otherwise, bias might cause an ineffective treatment to be 

21 accepted or an effective treatment to be rejected. 

22 Q: Are you saying that an observational study can never show a causal relationship? 

23 A: No, I am not. But I am saying that relying on observational data to make inferences 
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1 about causation is treacherous, as the hormone, beta-carotene, and Vitamin E studies show. 

2 Q: Does the scientific literature address this issue? 

3 A: Yes. For example, in 1983, Sylvan B. Green and David Byar gave an influential 

4 presentation to the Proceedings on Evaluation of Therapy entitled, “Using Observational Data 

5 from Registries To Compare Treatments: The Fallacy of Omnimetrics.” 

6 Q: Is JD-025242 a copy of their presentation? 

7 A: Yes, it is. The Proceedings of this Workshop were published in a journal called Statistics 

8 in Medicine. 

9 Q: Referring you to the second page of this document, were you an editor of the 

10 Proceedings. 

11 A: Yes, I was. 

12 Q: Who are Drs. Green and Byar? 

13 A: Very highly regarded biostatisticians. At the time of this presentation, they were at the 

14 National Cancer Institute. Dr. Byar is no longer living. Dr. Green is now at the Arizona Cancer 

15 Center. 

16 Q: What did they say about the ethics of randomized trials and observational studies? 

17 A: They said, “There are certainly many examples of non-randomized studies, whether 

18 purely observational studies or historically controlled trials, that have arrived at the correct 

19 answer to a treatment question. However, if because of methodological issues and the possibility 

20 of bias, the investigators fail to convince others of the validity of the results, then they may very 

21 well have wasted their resources. Conversely, if the investigators arrive at an incorrect 

22 conclusion because of bias, then any success they have in convincing others of their results 

23 represents a definite disservice to medical research. This ethical problem is perhaps more 
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1 compelling than any of the ethical issues raised against randomization.” (p. 361.) 

2 Q: Earlier you told us about the arguments made by some that it was unethical for the 

3 Women’s Health Initiative to conduct a randomized study on post-menopausal hormone 

4 use. Is the claim in the Farrelly paper that it would be unethical to conduct a randomized 

5 study similar to those arguments? 

6 A: Yes, it is, but I should point out that in the hormone case there were multiple 

7 observational studies on heart disease, as well as a study that found that hormones had a 

8 beneficial effect on cardiac risk factors like lipid levels. So the evidence for benefit of hormones 

9 was quite strong. It was, nevertheless, wrong. Here there were only two observational studies in 

10 one state. 

11 Q: Would it have been feasible to have conducted a randomized trial of the 

12 effectiveness of the “truth” campaign? 

13 A: Yes. In fact, according to the ALF web site, they have conducted a randomized trial. 

14 Q: How did they do it? 

15 A: The AFF website indicates that they conducted a three-year study beginning in 2000 

16 called the “American Fegacy Fongitudinal Tobacco Use Reduction Study” or “AFFTURS.” 

17 AFF’s web site states: “The AFFTURS study focuses on four communities that have less than 

18 average exposure to the “truth” campaign. Following a baseline survey in these four “low 

19 exposure” communities, two were randomly assigned to receive a dramatic increase in exposure 

20 to the “truth” campaign. This was accomplished by purchasing advertising in the local media. An 

21 increased level of exposure will be maintained for two years.” (JD-025208) 

22 Q: Was that a randomized study? 

23 A: Yes, it was. It was a so-called “cluster” randomization. The study took four local 
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1 communities that had low exposure and selected two of them at random to increase the amount 

2 of advertising. 

3 Q: When did this randomized study end? 

4 A: The ALF web site says ALLTURS was a three-year study, so if it began in 2000, it would 

5 have ended in 2002. 

6 Q: How does that compare to the study reported in Farrelly (2005). 

7 A: That also ended in 2002. 

8 Q: Has ALF published the results of its randomized study? 

9 A: As far as I know, no. 

10 Q: Has it released any data from the study? 

11 A: Not that I am aware. None of the data are available on the ALF web site. 

12 Test of Hypothesis 

13 Q: Did the authors of Farr elly (2002) indicate that they specified a hypothesis in 

14 advance, and then tested that hypothesis? 

15 A: Not as far as I can see. That is a serious matter because the validity of probability 

16 statements, or statistical inference, depends upon the pre-specifying the hypothesis you propose 

17 to test. But in this paper, the authors do not state what, if anything, their primary hypothesis may 

18 have been before they began analyzing the data. In fact. Dr. Healton admitted in her cross 

19 examination testimony that the authors changed the structure of the analysis after reviewing the 

20 data. 

21 Q: As you mentioned. Dr. Healton testified that ALF didn’t start out trying to compare 

22 the effects of “Think. Don’t Smoke” to those of “truth.” She stated: “That’s how it 

23 worked out. I think it [awareness of “Think. Don’t Smoke”] began as a control variable 
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1 and when they saw the results, they realized there was a real issue.” (20843:6-8.) She also 

2 stated: “I would hasten to add, that we didn’t enter this analysis expecting to find what we 

3 found with‘Think. Don’t Smoke.’ ‘Think. Don’t Smoke’was involved in this study, as 

4 were all the state campaigns, to control for their influences. We didn’t have any a priori 

5 views about whether they would be negative or positive, we simply wanted to control for 

6 them and we were very startled by the results we found, and I just want to be clear about 

7 that.” (p. 20858:14-21.) 

8 Dr. Wittes, was it scientifically appropriate to make “Think. Don’t Smoke” a 

9 control variable and then to compare the “truth” and “Think. Don't Smoke” campaigns 

10 only after seeing the survey results? 

11 A: No. 

12 Q: Why not? 

13 A: It is a fundamental principle that you cannot change your hypothesis after reviewing your 

14 data. If you find an unexpected result, you might form a new hypothesis to be tested with a new 

15 study, but it is a violation of the scientific method to conclude that you have proven a hypothesis 

16 that you didn’t form until you looked at the data. 

17 Q: Why is that? 

18 A: Because any large amount of data from a study can addr ess many different propositions. 

19 The mathematical chance of finding evidence that appears to support a proposition that you 

20 didn’t set out to test is so large that it is virtually a certainty. When you see that evidence, you 

21 can’t know if you identified a real relationship or a chance finding. 

22 Q: Is that important to the interpretation of the article? 

23 A: Absolutely. Dr. Healton acknowledged that the results reported in the study were not the 
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1 product of testing a hypothesis that was formulated in advance. Instead, they concerned an 

2 unhypothesized relationship that the authors identified only after they had run statistical 

3 analyses. 

4 Q: Why does that impact the results? 

5 A: The basis of statistical inference depends upon testing for those relationships in an a 

6 priori fashion, not after “peeking” at any relationships that a particular set of data may suggest. 

7 You can't claim “statistical significance” if testing is performed in a manner that is inconsistent 

8 with statistical principles. 

9 Q: Are there other indications that the Farrelly 2002 authors did not specify a 

10 hypothesis in advance, or changed the analysis in mid-stream? 

11 A: There are a number of important indications beyond Dr. Healton’s testimony. First, even 

12 though the study reports results of a comparison between “truth' 7 and “Think. Don't Smoke”, the 

13 study wasn’t designed to compare the “truth” and “Think. Don’t Smoke” campaigns. “Truth” 

14 was targeted at ages 12-17, while “Think. Don’t Smoke” was targeted at ages 10-14. The survey 

15 protocol excluded those in 10-11 age range. Thus the survey included the entire age target for 

16 the “truth” campaign, but excluded 40% of the “Think. Don’t Smoke” age target. In any 

17 measurement of “campaign” effect, the amount of non-response bias created by that difference is 

18 potentially very large, and would invalidate any attempted comparison of campaigns. 

19 Second, with respect to the “attitude” type questions in the survey, the authors themselves 

20 state at page 906 that “the attitudes that relate to the tobacco industry do not represent a test of 

21 the success of [the “Think. Don’t Smoke”] campaign” because it had not been “designed” to 

22 generate the type of “anti-tobacco attitudes” like “Cigarette companies lie” or “I would like to 

23 see cigarette companies go out of business.” But it was precisely those types of questions that 
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1 formed a core part of the evaluation of “attitudes” reported in Table 2 of the survey. In fact, the 

2 LMTS surveys asked about a wide variety of attitudes, but the authors reported results only for a 

3 selected sub-cluster of “anti-tobacco industry” attitudes. 

4 Third, the survey actually asked the “confirmed awareness” questions in a different 

5 manner for “truth” than it did for “Think. Don’t Smoke”. The questions were asked in a more 

6 probing and interactive fashion for “truth.” Additionally, the instructions to the interviewer were 

7 to score confirmation of “truth” differently than for “Think. Don't Smoke”. Those differences 

8 lead to potential bias because they would tend to create a directional, systematic difference. It is 

9 clearly invalid to test one campaign against another given those important differences in the 

10 survey design. 

11 Fourth, because of the time differences in the onset of the campaigns and the two 

12 surveys, and because the people who responded in LMTS-I were entirely different from the 

13 people who responded in LMTS-II, the authors could not make an “apples to apples” comparison 

14 between “truth” and “Think. Don’t Smoke”. The authors state that “to control for the possibility 

15 that the changes in attitudes are part of a secular trend” they included an indicator variable to 

16 “capture influences on national attitudes.” (p. 902) But indicator variables often do not capture 

17 enough relevant information to control for secular trends. 

18 Fifth, the survey didn’t include questions for all of the “Think. Don’t Smoke” ads. Since 

19 “confirmed awareness” measured the different ads seen, not the frequency or amount of total ads 

20 seen, failure to include all “Thin k . Don't Smoke” ads precludes any direct comparison of the 

21 campaigns. 

22 Sixth, the surveys included at least 3 variables that could have provided a measure for 

23 “awareness.” These were ’’aided,” “unaided.” and “confirmed” awareness. The authors report 
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no a priori reason why one measure should have been preferred over another, yet the results that 
link to attitudes and intention rely exclusively on the “confirmed awareness” measure. 

Q: Why do those examples matter, in terms of the validity of the results of the 2002 

paper? 

A: For several very important reasons. Whatever the original and still unstated a priori 

hypothesis may have been, Dr. Healton confirmed that it was changed after the investigators saw 
what they perceived as a pattern in the results. Under those circumstances, there is no basis to 
assert that the test of any revised hypothesis is statistically significant. The paper does not 
mention the change in hypothesis. Second, the questionnaire strongly suggests that the survey 
design was never structured to make a direct comparison of the campaigns. Nonetheless, after 
looking at analyses of the survey data, the authors changed their analyses to pursue that approach 
because of what they saw. With those types of built in potential biases, it is hardly surprising 
that selective reviews of their data would point to a difference. Finally, because the authors 
never specified any a priori hypothesis, we still don’t know how many of the other various 
combinations of “options,” such as the awareness metric or the smoking intention outcome, the 
authors examined before they chose and reported particular relationships. 

Q: Have you published on this principle? 

A: Yes, I have. I was a co-author of a paper in JAMA that made that very point. 

Q: Is JD-025241 the article you just mentioned? 

A: Yes, it is from JAMA in 1991 and is entitled, “Analysis and Interpretation of Treatment 

Effects in Subgroups of Patients in Randomized Clinical Trials,” by Salim Yusuf, Janet Wittes, 
Jeffrey Probstfield and Herma Tyroler. 

Q: What did you say in that article about changing your hypothesis after seeing the 
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data? 


1 

2 A: The subject of this paper was how to analyze data on subgroups in clinical trials. A 

3 number of issues arise when you look at results pertaining to subgroups of the group of subjects 

4 being studied. One issue we addressed was the difference between setting out to study subgroup 

5 effects before doing the study (“a prion ”) and doing so after seeing the results (“a posteriori”). 

6 We stated: “While postulating subgroup effects a priori allows formal statistical hypothesis 

7 testing, much less credence is due to formulating hypotheses after examining the data. Such 

8 analysis smacks of betting on a horse after the race is over.” (p. 95.) 

9 Q: Does the same principle apply where the change in the study involves something 

10 other than analysis of a subgroup? 

11 A: Yes, it does. The particular issue you are looking at doesn't matter. It is always 

12 treacherous to change or add a hypothesis after seeing the results, and it is almost never 

13 appropriate. Of course, one should look at data, but relationships found after looking are 

14 hypothesis-generating, not hypothesis-confirming. 

15 Measurement of Variables 

16 Q: Dr. Wittes, earlier you said that in assessing any potential causal relationship, it was 

17 important to measure the correct variables, and to measure them accurately. In general, 

18 how did the 2002 article assess exposure to antismoking campaigns? 

19 A: Even though the authors collected data on 3 potential measures of “exposure,” the 

20 principal results they report rely on “confirmed awareness.” They do display “unaided 

21 awareness” in Figure 1 of the article and mention it in the text, but they do not use that measure 

22 for any of the odds ratios they calculate. They also counted the number of unicjue campaign ads 

23 that the survey respondents could “confirm,” and reported a separate evaluation called “dose.” 
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1 Q: Using JD-055219, which is the survey for LMTS-II, could you explain how the study 

2 “confirmed" that a person had “awareness” of a campaign? 

3 A: Yes. For each ad, the survey instrument listed a set of questions. The ones we are 

4 concerned with here are the general question, which came first in the set, and the immediate 

5 follow-up question. To illustrate, Section D of the survey asks about specific television ads. The 

6 first question, Dl, asks about a particular “truth” ad: “Have you recently seen an anti-smoking or 

7 anti-tobacco ad on TV that shows young people bungee jumping off of a bridge?” Kids who 

8 answered either “yes” or “maybe” were then asked Question D2, which was, “What happens in 

9 this ad?” Anyone who could provide a confirming detail was counted as having “confirmed 

10 awareness” of that ad. To have confirmed awareness of the campaign, a respondent had to have 

11 confirmed awareness of at least one ad of that campaign. It doesn’t matter how many other ads 

12 the kid could identify; as long as he or she had confirmed awareness of one ad in the campaign, 

13 the kid was counted as having confirmed awareness of the campaign. But don’t forget that 

14 “confirmation” was actually done differently for the two campaigns, both in terms of how the 

15 surveyor probed for a confirming response, and what type of response counted as confirmation. 

16 Q: How was “dose” determined? 

17 A: “Dose” was the count of different campaign ads for which a respondent confirmed 

18 awareness. 

19 Q: Is that an appropriate measure of “dose,” as that term is normally used? 

20 A: No, it’s a rather unusual measure of “dose.” Dose customarily measures amount, 

21 intensity, or duration, not “diversity.” Yet, according to the definition these authors used for 

22 “dose,” a kid who saw campaign ad “A” 10 times and campaign ad “B” 10 times, would be 

23 classified as having the same “dose” as a kid who saw ad “A” once and ad “B” once. Thus the 
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1 study could measure amounts of exposure that differed by a factor of 10 or more as equal 

2 “doses.” Moreover, even though the authors say several times that “dose” was the “total number 

3 of advertisements seen,” that could not possibly be correct. The total or cumulative number of 

4 times a respondent saw any particular ad, or all ads as a group, wasn’t even data the survey 

5 collected. 

6 Q: What were the outcome measures in the 2002 paper? 

7 A: There were two groups of outcomes. The first group was a particular subset of attitudes 

8 that the survey had asked about. The second was related to a question that asked, “Do you think 

9 you will smoke a cigarette at any time during the next year?” 

10 Q: Let’s start with the attitudes. Do the attitude results provide valid evidence that 

11 “truth” exposure impacts smoking? 

12 A: No. In a cross-sectional survey, you only have responses from one moment in time, so 

13 you cannot determine a time sequence. But “causes” happen in a particular time sequence: in 

14 order to be a cause, the exposure must precede the effect. LTnder this study architecture, 

15 however, you can’t determine which came first, the attitude or the exposure. There is no way to 

16 know whether the teens who already had “anti-industry” attitudes were more likely to recall 

17 exposure to “truth” ads, or more likely to watch them, than those without “anti-industry” 

18 attitudes. We can’t tell if the kids who said “cigarette companies lie” said that because they saw 

19 “truth” ads, or because that was their preexisting belief, or because kids who don’t like cigarette 

20 companies are more likely to recall “truth” ads. The authors acknowledge this basic problem 

21 regarding their research design. 

22 Second, as I mentioned earlier, there is no basis to compare “truth” and “Think. Don’t 

23 Smoke” on the attitudes selected, because one campaign was attempting to send an “anti- 
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1 industry” message and the other was not. This problem is compounded by the fact that the 

2 survey asked about a broader group of attitudes, yet the authors reported on only this particular 

3 subset. Because no specific hypothesis was stated in advance, and only a select subset of 

4 particular' attitudes are reported on, the meaning of the p-values is highly questionable. 

5 Additionally, this study design cannot test whether those attitudes are, in fact, associated 

6 with a change in smoking behavior. Attitudes don't necessarily reflect changes in behavior. 

7 And reported attitudes don't necessarily reflect a person’s actual attitude. 

8 Q: What results did the authors report about smoking intentions? 

9 A: Their results are set forth in Table 2, and I have included below the row in which the 

10 authors report an odds ratio for responding to a particular question about smoking intention, 

11 based on either “confirmed awareness” or “dose.” 



Confirmed Awareness, OR (P) 

Dose, 3 OR (P) 


“truth" 

TDS 

“truth” 

TDS 

Do you think that 
you will smoke a 
cigarette at any time 
during the next 
year? c 

1.657 (0.088) 

0.644 (0.050) 

1.076 (0.347) 

0.770 (0.017) 


12 3 Number of advertisements seen 

13 c Definitely or probably not 

14 

15 Those results show what their regression model estimated as the odds ratios for 

16 “confirmed awareness” and “dose” for the “truth'’ and “Think. Don’t Smoke” campaigns, with 

17 respect to the question about smoking intention. 

18 Q: What are the “odds ratios” that are being reported? 

19 A: Without getting too technical, “odds” are the probability that something is so, divided by 

20 the probability that it is not so. For example, if the chance of rain tomorrow morning is 50%, the 

21 odds are .5/.5. We say the odds of rain in the morning are “even,” or 1 to 1. If the chance of rain 
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1 tomorrow night is 75%, we calculate the odds as . 151.25 , or odds of 3 to 1. The “odds ratio” is 

2 just the ratio of two odds, so the odds ratio of rain tomorrow morning, compared to tomorrow 

3 night, is 1/3, or .33. Oversimplifying slightly, Table 2 reports whether the kids with “confirmed 

4 awareness” or a higher “dose” were more or less likely to respond to particular attitude cjuestions 

5 or the question about smoking intentions than kids who were not exposed. 

6 Q: From simply a statistical viewpoint, how do we interpret the odds ratios about 

7 smoking intentions? 

8 A: It depends upon the specific responses that are being compared. For instance, the 

9 “confirmed awareness” questions were linked to the likelihood that a teen would respond that he 

10 thought he would “definitely not” or “probably not” smoke in the upcoming year. In that case, 

11 an odds ratio above 1.0 would mean that kids with confirmed awareness were more likely to 

12 respond that they thought they would definitely or probably not smoke, compared to kids who 

13 said they did not have confirmed awareness of the campaign. 

14 Q: Assuming there were no problems with the study, how does one interpret the odds 

15 ratios for the smoking intention question? 

16 A: The “truth” campaign had an OR of 1.657, meaning a higher percentage of those with 

17 “confirmed awareness” of “truth” disagreed or strongly disagreed that they would smoke in the 

18 next year. The /rvalue was 0.088, so the result wasn’t statistically significant. But that test lost 

19 any meaning after the authors changed their hypothesis because their data suggested a 

20 relationship with “Think. Don’t Smoke” that they didn't anticipate or hypothesize. The OR for 

21 “Think. Don't Smoke” was 0.644 with a p-value of 0.050, so it is nominally statistically 

22 significant. But again, the hypothesis test was invalid because it was formulated in response to 

23 relationships the authors saw in the dataset. There were also other important problems, like 
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1 omitting “Think. Don’t Smoke" ads. and asking for confirmation of “truth” awareness in a more 

2 probing way than for “Think. Don’t Smoke”. These problems render the results basically 

3 uninterpretable. 

4 Q: What were the results relating campaign “dose” to intention to smoke in the next 

5 year? 

6 A: The result for “truth” was not statistically significant, but the article reports that “Think. 

7 Don't Smoke” had an OR of 0.770 (p=0.017). 

8 Q: Do the same problems you just mentioned apply to the “dose” odds ratios as well? 

9 A: Yes. There are the additional problems. Remember “dose” doesn’t really mean dose as 

10 we ordinarily use the word, and the comparison method was slanted against the “Think. Don’t 

11 Smoke” campaign because of the omitted ads and the way “dose” was calculated. 

12 Q: Are there other significant problems with this study as well? 

13 A: Yes, there are. 

14 Q: Before I ask you about those other problems, I want to ask about the magnitude of 

15 the odds ratios that the authors estimated. Putting aside the problems we have discussed, 

16 how would you characterize the magnitude of the association they estimate in this study? 

17 A: These were all weak associations. Considering the study architecture, the bias we can 

18 confirm, the potential for other bias, and the high nonresponse rate, I would disregard the 

19 estimated odds ratios for that reason alone. 

20 Q: Considering the magnitude of the association that the authors report, the study 

21 architecture and methods, and the problems you have identified thus far, of what value is 

22 this study to the question of whether exposure to “truth” impacts the behavior of 

23 nonsmoking 12-17 year olds. 
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1 A: It is of no assistance in providing an answer to that question. 

2 Q: You mentioned that there were other problems with the Farrelly 2002 study. What 

3 problems were you referring to? 

4 A: The survey weights create unstable estimates. Also, even though the authors collected 

5 data on them, the authors left out of their model highly important variables that are known to 

6 impact smoking behavior. There are also a number of other important biases. 

7 Q: Let’s begin with the study weights. First, what are study weights? 

8 A: Study weights are numbers used to correct for the unequal selection probability among 

9 the people in a sample. ALF heavily oversampled various ethnic and racial minorities in both 

10 LMTS I and II. In order to provide an unbiased estimate of the population the sample was 

11 intended to represent, minorities had to be “weighted down,” and whites had to be “weighted 

12 up.” 

13 Q: How did that create a problem? 

14 A: I can only describe some of the consequences. ALF hasn’t made the details of its 

15 weighting procedure public, so I can’t describe everything that led to these consequences. 

16 Q: What do you know about how the weights were set? 

17 A: The authors state that the weights depend on age, race/ethnicity, and residence in states 

18 with funded countermarketing campaigns. The data, which I examined, reveal that respondents 

19 were also stratified into nine groups. Within the nine strata there were further adjustments for 

20 factors like race and ethnicity. 

21 Q: What is a stratum? 

22 A: A stratum is a subpopulation. We often divide our study population into strata that reflect 

23 relatively “homogeneous” subgroups that are basically the same on one or more characteristics 
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1 related to the outcome variable. The goal is also to have each stratum be different from the 

2 others. 

3 Q: What characteristics did the authors use to stratify the population into the nine 

4 subpopulations? 

5 A: The authors don’t describe the stratification in the article, and it is unclear from the data 

6 or the codebook. The strata don’t appear to be defined by race, ethnicity, age or any other 

7 identifiable demographic factor, at least not directly, because those are the factors that drove the 

8 study weights, and quite different variations of weights are represented among the 9 strata. 

9 Q: Do you recall that Dr. Healton was asked about the nine strata and testified that the 

10 criteria used to define the strata were race and ethnicity. She said, “There were nine strata 

11 so we could determine the impact of the campaign on different minority groups and 

12 different ethnic groups.” (p. 20902:23-25.) In addition, asked to list the nine 

13 subpopulations, she said: “Off the top of my head I can’t name them all, but it would be 

14 the nine most prominent racial ethnic groups in the country.” (p. 20903:2-9.) Asked to 

15 provide the name of one such subpopulation, she said, “African Americans.” (p. 20903:17- 
lb 19.) 

17 A: Yes 

18 Q: Have you reviewed the LMTS surveys and the data on ALF’s web site to determine 

19 whether the nine strata were defined by race and ethnicity? 

20 A: Yes, I have. 

21 Q: What did you find? 

22 A: I found that they were not defined by race and ethnicity. 

23 Q: How do you know that? 
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1 A: I know that for two reasons. First, the survey obtained information on only seven racial 

2 and ethnic groups, not nine. The second reason I know that Dr. Healton is incorrect is that each 

3 stratum contains members of more than one racial and ethnic group. For example, there was no 

4 “African American” stratum because African Americans were contained in all the strata. 

5 Q: How did the weighting of members of the same racial and ethnic group compare in 

6 the different strata? 

7 A: They were different, often very different. For example, African Americans are weighted 

8 differently depending on which stratum they were in. It didn’t matter if they were the same age 

9 or gender. In other words, there was something about the strata that affected the weighting. 

10 Q: Does Dr. Healton’s testimony assist you in trying to determine the criteria for 

11 dividing the respondents into the nine strata? 

12 A: No, it doesn’t because her explanation is clearly in error. 

13 Q: What is the basis for your concern about the study’s weighting method? 

14 A: The differences in weights were enormous. In LMTS-I, weights ranged from a high of 

15 over 71,000 to a low of about 6, and in LMTS-II from a high of over 45,000 to a low of under 

16 18. 

17 Q: What do those weights mean? 

18 A: They are the number of people in the general population that a particular person in the 

19 LMTS sample “represents.” In other words, some respondents in the first survey represented 

20 more than 70,000 people in the general population, and some represented fewer than 6. There 

21 was a high to low range of about 12,000 to 1 in the weightings in the first survey and about 2600 

22 to 1 in the second. 

23 Q: Why do the differences in the study weights matter in assessing the validity of the 
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1 study? 

2 A: Weights can have very important effects on estimates from surveys. The greater the 

3 difference in study weights, the more cause for concern one should have about the possibility 

4 that the weights are affecting the results. 

5 Q: How can weighting impact the results? 

6 A: The people with dramatically higher weights have a lot of influence on the results. It 

7 would take only small changes in the heavily weighted group to alter the outcome of the study. 

8 In fact, in the entire LMTS-I survey, if we change the “confirmed awareness” status of only two 

9 of the highest weight respondents, the odds ratio for “Think. Don’t Smoke” changes direction 

10 and becomes not statistically significant. 

11 Q: Why is that important? 

12 A: It shows how remarkably sensitive this study is to the weights. A study should not 

13 depend crucially on the responses of just a few individuals. 

14 Q: Are there any other anomalies or problems with the weights? 

15 A: Yes. Some of my analyses suggest a problem with either with the weights themselves or 

16 the sample size - not with the entire sample, but with the size of certain subgroups or cells. It 

17 seems probable the manner in which the race weightings were calculated compromised the 

18 study’s ability to examine smoking trends. 

19 Q: Why is that? 

20 A: In LMTS-I, the weights were set so that 12-17 year old non-smokers represented a 

21 population of 27 million non-smokers, 74% of whom were white. In LMTS-II, the 12-17 year 

22 old non-smokers represented a population of 21 million nonsmokers, 61% of whom were white. 

23 Q: How does that illustrate a problem with the weights? 
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1 A: According to the weights, the represented population experienced a very significant 

2 downward trend in nonsmokers, in other words, an increase of about 6 million smokers aged 12- 

3 17. This occurred at a time when smoking was trending downward, not upward, in that group. 

4 Second, the racial makeup of 12-17 year olds could not possibly have changed by that amount. 

5 Tellingly, the unweighted percentages are not very different between the groups, which points to 

6 the weighting as the source of the problem. 

7 Failure to Control for Important Confounding Factors 

8 Q: Let’s go to your next criticism, the failure to include some important predictors in 

9 the model; what do you mean by that? 

10 A: During this time period, smoking decreased among youths, so it is particularly important 

11 to separate out factors unrelated to the campaigns that may have affected adolescent attitudes and 

12 intentions toward smoking. In their regression models, the authors controlled for some potential 

13 confounders like race and gender, but failed to control for others that are known to be important, 

14 for example, exposure to smokers, sensation-seeking, and exposure to other information that may 

15 affect attitudes towards smoking. Without that form of control in the regression, the odds ratios 

16 are potentially biased. 

17 Q: Are there any potential confounders where the included control appears to be 

18 inadequate? 

19 A: One was local tobacco control spending. The authors controlled for state, but not local, 

20 spending. 

21 Q: Why is that important? 

22 A: Because there is an indication that the areas with the heaviest exposure to the “truth” 

23 campaign were the ones with the highest amount of local tobacco control spending. 
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Can you identify JD-025219? 


1 Q: 

2 A: Yes, it is the minutes of a board meeting of ALF on September 26, 2002. 

3 Q: Does this document contain an indication of just what you were saying? 

4 A: Yes, it does. On page 7, Dr. Farrelly states that “truth” exposure is high in large urban 

5 markets and that a limitation to his studies was local tobacco control policies and activities in 

6 areas with high truth exposure. 

7 Q: Did the authors control for smoking by peers? 

8 A: No. In my analysis of the LMTS data, I ran some models to estimate the association 

9 between whether any friend smoked and intention to smoke in the next year. The /rvalue for the 

10 1 df chi-square variable measuring that association, regardless of the model, was amazingly 

11 small, around 10 16 . 

12 Q: What does 10 16 mean? 

13 A: It is the same as a number that has 15 zeros after the decimal point and then a 1, or 

14 0.0000000000000001. This is less than one millionth times one billionth. This association is 

15 clearly not random. Compare that to what is standardly used for statistical significance, 0.05 or 

16 one in twenty. This is a variable about which the survey collected data; the authors should have 

17 controlled for it in their regression. 

18 Q: Have you seen Dr. Healton’s explanation for why they didn’t control for peer 

19 smoking in the 2005 study? 

20 A: Yes, I have. She says that they did not do so because the “truth” campaign might have 

21 affected both the subjects’ smoking and their friends’ smoking. 

22 Q: How do you react to that? 

23 A: She is partially correct. In terms of the 2005 study one might hypothesize that “truth” 
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1 might have affected both the subjects’ smoking and friends’ smoking. But that doesn’t apply to 

2 the 2002 study. 

3 Q: Why wouldn’t it apply to the 2002 study? 

4 A: Because the outcome variable was not smoking, but intention to smoke. And even if it 

5 could apply to this study, that would merely mean that the authors should have used an 

6 experimental study design. It doesn’t excuse the failure to control for friends'smoking. I plan 

7 to address this subject when I discuss the 2005 paper. 

8 Nonresponse Bias 

9 Q: You mentioned that there was a problem with the response rate; would you 

10 elaborate on that? 

11 A: Yes. The response rate for LMTS-I was only 52.5% and for LMTS-II was 52.3%, 

12 meaning that both surveys did not reach nearly 50% of the people they attempted to reach. That 

13 very low response rate by itself it casts doubt on the findings because of the potential that the 

14 people who did not respond differed in relevant ways from those who did. With a nonresponse 

15 percentage that large, it would not take big differences between those who responded and those 

16 who did not to change the reported results significantly. 

17 Q: Is a nearly 50% nonresponse rate uncommon? 

18 A: It often happens in cross-sectional telephone surveys, but that doesn’t make it any less a 

19 problem. The issue is the size of the nonresponse in comparison to the size of the proposed 

20 relationship, as well as what we know about how non-respondents may be different. 

21 Q: Can you identify JD-025217? 

22 A: That is a chapter called, “Reference Guide on Survey Research,” by Shari Seidman 

23 Diamond, J.D., Ph.D., Professor of Law and Psychology at Northwestern University and Senior 
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1 Research Fellow at the American Bar Foundation. It is taken from the Reference Manual on 

2 Scientific Evidence published by the Federal Judicial Center. 

3 Q: If you would look at page 245, what does Professor Diamond say about the response 

4 rate in surveys? 

5 A: She says: 

6 One suggested formula for quantifying a tolerable level of nonresponse in a 

7 probability sample is based on the guidelines for statistical surveys issued by the 

8 former U.S. Office of Statistical Standards. According to these guidelines, 

9 response rates of 90% or more are reliable and generally can be treated as random 

10 samples of the overall population. Response rates between 75% and 90% usually 

11 yield reliable results, but the researcher should conduct some check on the 

12 representativeness of the sample. Potential bias should receive greater scrutiny 

13 when the response rate drops below 75%. If the response rate drops below 50%, 

14 the survey should be regarded with significant caution as a basis for precise 

15 quantitative statements about the population from which the sample was drawn. 

16 

17 Q: Do you agree with that? 

18 A: Yes, I do, but again, the size of the proposed relationship is important, as well as what we 

19 know, and don’t know, about the ways in which the nonresponders may differ. 

20 Q: Does the 2002 paper indicate that the authors regarded the survey with significant 

21 caution as a basis for precise quantitative statements? 

22 A: No, it doesn't. After mentioning the response rate, the authors don't discuss it further. 

23 Q: What can we determine about those who did not respond? 

24 A: The authors state they made 12 callback attempts to reach both the adolescents and their 

25 parents at home. They spread their attempts across all days of the week and times of day. And 

26 they made up to two attempts to persuade respondents to participate unless the parents or 

27 adolescents were adamant in their refusal. 

28 Q: Isn’t that an adequate attempt to reach as many respondents as possible? 

29 A: I believe they made an adequate, even admirable, attempt to reach respondents, but not an 
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1 adequate effort to determine if the low response rate might have affected the outcome or how the 

2 nonresponders may differ from the responders. 

3 Q: Is there any reason to believe that response rates in the Farrelly (2002) study varied 

4 according to the characteristics of the potential respondents? 

5 A: Yes. Response rates typically vary considerably by race, ethnicity, and social class, 

6 which are all characteristics that are associated with smoking behavior. 

7 Q: Could the authors have attempted to determine if the low response rate affected the 

8 outcome of their study? 

9 A: One way that is often used is to compare the responses of people who were reached based 

10 on how many attempts it took to reach them. We know that the authors made up to 12 attempts 

11 to reach people. They could have estimated trends in the response rate according to the number 

12 of attempts necessary to reach people. They then could have extrapolated those results, on the 

13 basis of the estimated trend, to the 50% who never responded. That would have provided at least 

14 some adjustment for the vary large nonresponse rate. The authors should have either have made 

15 an adjustment or acknowledged that the low response rate was a significant weakness in the 

16 study and casts doubt on the validity of their findings. 

17 Q: Did Farrelly (2002) do that? 

18 A: Not according to the published paper. 

19 Con fumed Awareness 

20 Q: I’d like to return to “confirmed awareness.” You said that “confirmed awareness” 

21 of the campaign was based on having “confirmed awareness” of at least one ad in the 

22 campaign; did the article indicate how many ads the survey asked about? 

23 A: Yes. The authors said that they asked about all advertisements from both campaigns that 
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1 aired within six weeks of the survey’s start. 

2 Q: JD-055088 is a letter from Carolyn Levy of Philip Morris USA to Dr. Lyndon 

3 Haviland of ALF. What does that letter indicate about whether these were all the ads of 

4 these campaigns that ran in the six weeks before these surveys began? 

5 A: Dr. Levy states that the survey asked about only two of the nine TDS ads that were being 

6 run during that period and that two of the ads that were included in the survey were not being ran 

7 then. She also states that, based on Philip Morris USA’s allocations, the survey missed 62% of 

8 its ad allocations during the fall of 2000. She suggests that this might have biased the results. 

9 Q: Do you have any other reason to believe that LMTS-II did not ask about all of the 

10 TDS ads? 

11 A: Yes. 

12 Q: What are the other reasons? 

13 A: ALF explains in its First Look Report 9, available on its website (JD-025203), that “TDS 

14 ads in the survey were chosen based on reports from a commercial monitoring service and may 

15 not capture all of the campaigns’ ads.” (JD-025203 at 15 n. 3). In that same report, ALF states 

16 that “TDS ads were chosen based on ad tracking information from Video Monitoring Service.” 

17 (JD-025203 at 31). In addition, information from the same service that ALF reports was used to 

18 determine which TDS ads should be in the survey, shows that more than four TDS ads were 

19 airing at the time. 

20 Q: In her cross-examination, Dr. Healton was shown a report from Video Monitoring 

21 Service (“VMS”), JD-055410 [VMS Report] that appears to report Philip Morris youth 

22 smoking prevention advertisements airing during the months of July, August, and 

23 September 2000. (Trial Tr. at 21823:9-15). Did you try to confirm the information shown 
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1 on this report? 

2 A: Yes. After learning of the report and Dr. Heal ton's testimony, my office contacted VMS 

3 to confirm what type of information VMS provides, how they prepare reports, and how they 

4 prepared JD-055410, in particular. Chris Deagazio of VMS’s New York office reported to my 

5 office that VMS generates JD-055410 and reports of that nature in two steps. First, someone like 

6 him goes to a database to identify ads bought by a specific advertiser, in this case, Philip Morris 

7 USA (or whichever advertiser the requester is interested). Second, VMS takes that list and 

8 compares it against an obviously much larger database to identify all of the times at which the 

9 ads aired during a given time period, in this case three months. This type of report is called an 

10 occurrence report, and JD-055410 is an example of such a report, according to VMS. Mr. 

11 Deagazio said he could think of no reason why the information in JD-055410 would not have 

12 been available in 2000. A review of VMS’s website (www.vmsinfo.com) shows that occurrence 

13 reports of this nature are “standard reports” under the Advertising Monitoring area of its website. 

14 The website also says on the same page that it maintains “occurrence and spending data from the 

15 past 20 years.” 

16 JD-055410 shows that 15 Philip Morris USA ads aired nationally from July 28 through 

17 September 7, 2000. I understand that not all of these ads are TDS ads, but in reviewing JD- 

18 050804 [Creative History with Storyboards] , which contains storyboards of TDS advertisements, 

19 it appears that 10 of the ads described in the VMS Report correspond to the pictures and script in 

20 the storyboards. Although I obviously do not know for certain the specific number of ads aired 

21 or when they aired, the correspondence between ALF and Philip Morris USA, the statements in 

22 the Fu st Look Report 9, and the information in the VMS report, makes it likely that more TDS 

23 ads were running during the six week period preceding the start of LMTS-II than those that were 
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1 asked about in the survey. 

2 Q: Doctor, is this information of the type reasonably relied upon by experts in your 

3 field in forming opinions and inferences? 

4 A: Yes. As the 2002 article relies upon a selection method for the TDS ads to be asked 

5 about, relying upon information regarding the source of that information is reasonable. 

6 Q: Would that have biased the results? 

7 A: Yes. 

8 Q: Why? 

9 A: The failure to include all the ads means that the statement in the article that all the ads of 

10 the two campaigns that ran within six weeks of the start of the campaign was not true, and that 

11 failure clearly had the potential to bias the study. Think about how “confirmed awareness” to the 

12 campaign was determined. A respondent merely had to have confirmed awareness of one “TDS” 

13 ad to have confirmed awareness of the entire “TDS” campaign. There is no way to know how 

14 many kids who were unaware of the ads included in the survey would have had confirmed 

15 awareness of other ads if they had been included, but every one of those kids would have been 

16 classified as being unaware of the campaign. These respondents were put into what, in effect, 

17 was the study’s control group, when some of them should have been in the group with exposure. 

18 If you think of how the odds ratio is determined (e.g., odds of intention to smoke among those 

19 with confirmed awareness divided by odds of intention to smoke among those without confirmed 

20 awareness), they were put into the denominator instead of the numerator. 

21 Q: Would that have made a difference in the outcome in the “confirmed awareness 

22 category? 

23 A: It is impossible to say that it wouldn’t have. The/?-value for “confirmed awareness” of 
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1 TDS was barely significant at 0.05 as it was, and it is surely possible that if respondents were 

2 placed in the correct category, it would have changed the outcome. And, in any event, it means 

3 that the authors used incorrect data and performed incorrect calculations to get their results. 

4 Q: Would that flaw have made a difference in the “Dose” category? 

5 A: My answer is the same. The determination of “dose” was based on the number of 

6 different ads that the respondents had seen, but they were not asked about all of the “TDS” ads. 

7 It is impossible to have confidence in the outcome that the authors reported. 

8 Q: JD-052681 is Dr. Haviland's response to Philip Morris, have you reviewed that 

9 document? 

10 A: Yes. 

11 Q: Did Dr. Haviland disagree about whether they had included all the TDS ads in the 

12 survey? 

13 A: No, she didn’t. 

14 Q: How did she respond to Dr. Levy’s statement about omitting most of the “TDS” 

15 ads? 

16 A: She states: “It is true that the survey did not include every ad that was airing at the time, 

17 yet confirmed awareness was 66%, which is comparable to more recent surveys that include a 

18 more comprehensive set of ads based on information provided by Philip Morris USA Youth 

19 Smoking Prevention staff. The most important point, however, is that there is no reason to 

20 believe that the failure to include all ads in any way biases the conclusions of our study. We 

21 have a reasonable estimate of awareness of ‘Think. Don’t Smoke’ and this awareness has been 

22 shown to be associated with undesirable results.” 

23 Q: Do you agree with her? 
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No, I don’t. 


1 A: 

2 Q: Why not? 

3 A: She makes two points, neither of which justifies the conclusions of the study. Her first 

4 point is that the confirmed awareness of TDS was comparable to more recent surveys that 

5 included more TDS ads. But that is irrelevant. If you misclassify respondents, it doesn’t matter 

6 whether the total with exposure is “comparable” to the total in another study. Her second 

7 argument is that the awareness of the respondents was “associated with undesirable results,” but 

8 that was only in comparison with respondents who were potentially misclassified as being 

9 unaware. 

10 Q: When was the 2002 article published in relation to when Dr. Levy sent her letter? 

11 A: Dr. Levy’s letter was written in February 2002 and the article was published in the June 

12 2002 issue of the American Journal of Public Health. 

13 Q: Did the article mention the mistake about the TDS ads? 

14 A: No. As I said, the article indicated that the survey incorrectly stated that the survey 

15 “included all advertisements from both campaigns aired within 6 weeks of the survey’s start.” 

16 (Farrelly 2002 at 902.). 

17 Q: In your experience do scientific journals allow authors to correct mistakes that they 

18 discover within a few months of the publication date. 

19 A: Absolutely. Corrections can be made when the article is in galley proofs. And even if 

20 the authors learn of mistakes after publication, errata can be published. 

21 Q: Would you look at the last page of JD-065578, the Farrelly 2002 article; what is 

22 that? 

23 A: It is an erratum published in the May 2003 AJPH regarding the Farrelly article. 
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1 Q: What error was mentioned? 

2 A: An error in two confidence intervals. The change didn’t affect the outcome of the study. 

3 Q: Was there any mention that the article had incorrectly reported that all TDS ads 

4 were included in the LMTS-II survey? 

5 A: No, there wasn’t. 

6 Q: I would like to ask you about what Dr. Healton said about whether they had asked 

7 about all the “Think. Don’t Smoke” ads. First, she said, “But I hasten to add, as you know, 

8 that there is an open-ended question in which we say have you seen any other ads, the 

9 theme of which relates to smoking, and then if they say, oh, yes, I saw another one, we ask 

10 them to think of just anything from it, just one tiny thing. And if they say it we do more 

11 probes with them and that’s how they get a confirmed awareness level, and I do want to 

12 fully answer this question.” (20861:23-20862:5.) Have you reviewed the questionnaire to 

13 see if there is such an open-ended question? 

14 A: Yes, I have. 

15 Q: Is there such an open-ended question? 

16 A: No, there isn’t. “Confirmed awareness” is based entirely on first asking about a specific 

17 ad and then asking for another detail about the ad. 

18 Q: If there were such an open-ended question, would that avoid the potential 

19 misclassification due to having asked about so few ads? 

20 A: No, it wouldn't. 

21 Q: Why not? 

22 A: Because there was an asymmetry between the way they asked about the two programs. 

23 In the case of each “truth” ad, the interviewer prompted with a detail about the ad. Asking 
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1 whether the respondent could recall any other ad without providing a detail is an entirely 

2 different question. 

3 Asymmetry in the Questions About the Two Campaigns 

4 Q: Was there any other problem with the survey instrument in the study? 

5 A: Yes, there was. There was what I would call “asymmetry"’ in how they asked the 

6 question on which confirmed awareness was determined for the two campaigns. In the case of 

7 the “truth” ads when the respondent was asked, “What happens in these ads,” if he or she said, 

8 “truth,” the questioner was to probe for more information, and the respondent got credit for 

9 saying, “truth.” But in the case of the “TDS” ads, there was no probe, and the surveyor was not 

10 instructed to give credit if a respondent simply said. “Think. Don’t Smoke.” 

11 Q: Is that a flaw in the study? 

12 A: Yes. If you want to compare two campaigns, you have to ask about them in the same 

13 way. Otherwise, you don’t know if any difference is simply the result of a difference in the way 

14 the question was asked. 

15 Q: As you know, Dr. Healton was asked about the difference between the way the 

16 questions were asked about the two programs and said that they didn't probe about 

17 “Think. Don’t Smoke” because it wasn’t a branded campaign. In your opinion, is that a 

18 legitimate reason not to have probed about “Think. Don’t Smoke?” 

19 A: No, it isn't. 

20 Q: Why not? 

21 A: Whether “Think. Don’t Smoke” is a “brand” or merely a “slogan” has nothing to do with 

22 whether questions should be asked about the campaigns in the same way in order to compare 

23 them. 
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1 Q: Dr. Healton was asked if “good prudent practice methodology would say you would 

2 want that question to be the same” and answered, “If it was possible to be the same.” 

3 (20973:18-21.) In your opinion would it have been possible for the questions about “truth” 

4 and “Think. Don’t Smoke” to be the same? 

5 A: Yes. The authors would merely have had to include an instruction to probe if the 

6 respondent had said, “Think. Don't Smoke” but no other detail about the ad, and to count the 

7 response in the same way as it was counted for “truth.” 

8 FARRELLY (2005) 

9 Description of Study 

10 Q: Let’s move to the second article, Farrelly (2005)(JD-025252). What was the purpose 

11 of this study? 

12 A: The authors state that it was a study based on “repeated cross sectional surveys” to 

13 “assess[] whether there was a dose-response relationship between the level of exposure to the 

14 campaign and youth smoking prevalence during the first 2 years of the campaign.” (p. 425, 430) 

15 Q: How did they go about trying to do that? 

16 A: They linked two sources: (1) smoking prevalence and control data from two cross 

17 sectional surveys administered by Monitoring the Future among U.S. students in 8 th , 10 th and 12 th 

18 grades; and (2) media market data for the “truth” campaign. 

19 Q: At some points, the authors refer to the study as “quasi-experimental.” Do you 

20 agree with that description? 

21 A: I don’t like the terminology “quasi-experimental” - to me, either a study is an experiment 

22 or it’s not. I agree with their description of the study as “repeated cross sectional surveys.” The 

23 authors use data from the Monitoring the Future surveys, and Monitoring the Future sampled the 
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1 point prevalence of variables using separate samples in each year. In Monitoring the Future, 

2 school and classrooms are randomly chosen to be surveyed. But since the Farrelly 2005 authors 

3 incorporated no randomization or manipulation of their study variables, and instead simply 

4 observed students across various GRP levels, I can’t agree with the terminology that suggests 

5 this was an experiment. 

6 Q: How was exposure to the “truth” campaign determined? 

7 A: The authors used cumulative gross rating points (“GRPs”) in each of the 210 television 

8 markets from 2000 to 2002. Within a period, the average rating for a media schedule or ad 

9 campaign is multiplied times frequency, which is the number of times the ads were run. So a one 

10 week schedule of ten “truth” commercials that achieved an average rating of 20 in a particular 

11 media market would produce 200 GRPs. 

12 Q: How much did the “truth” GRPs vary? 

13 A: The cumulative GRPs ranged from 647 to 22,389 across five groupings of media 

14 markets. The authors mapped those 5 exposure groups across the 48 contiguous states. The 

15 GRP ranged from 647 to 4995 in the lowest group, and 18,042 to 22,389 in the highest group. 

16 The average GRP for the low group was 3867, and 20,367 for the high. 

17 Q: What was the overall average GRP level? 

18 A: That information is not in the article, but according to Dr. Farrelly’s statement on a web 

19 site, the overall average was about 16,000. 

20 Q: Is the web site to which you just referred to reflected on JD-025212? 

21 A: Yes, it is. It is a web site entitled, “The Rest of the Story: Tobacco News Analysis and 

22 Commentary” run by Michael Siegel. 

23 Q: What data did the authors use to determine smoking prevalence? 
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1 A: They used data for current smoking prevalence from Monitoring the Future for 8 th , 10 th 

2 and 12 th graders from 1997-2002. The measure used was any cigarettes smoked in the past 30 

3 days. 

4 Q: How did the authors attempt to evaluate whether exposure to the “truth"’ ads 

5 influenced youth smoking behavior? 

6 A: They used logistic regression to estimate the odds ratio of smoking according to GRP 

7 dose. 

8 Q: What was the relationship they estimated between the odds of smoking and GRP 

9 dose? 

10 A: That odds depended upon the model. When they estimated the odds ratios using a linear 

11 term for GRP. in other words, by specifying a stable rate of change in smoking behavior with 

12 each 10,000 GRP “dose,” none of the odds ratios was statistically significant, either for the 

13 overall model, or for any of the 8 th , 10 th , or 12 th grade subgroups. The models on which they 

14 base their conclusions incorporated what is called a quadr atic term in the model for GRP 

15 exposure. 

16 Q: What is a quadratic term? 

17 A: It is one of a number of equations that can be used to allow a model to find a better “fit” 

18 with the actual data, when the data do not fall in a straight line. Consider the hypothetical data in 

19 the curve below: 
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j JDEM-020263 

2 A curved line called a parabola might “fit'’ those hypothetical data better than a straight line, in 

3 part because the points at either extreme are weighted more heavily in the equation. 

4 Q: What happened when the authors fit a quadratic term? 

5 A: We don’t have the author's actual data. We know they concluded that a straight line did 

6 not fit the data as well, and that adding the quadratic term provided a better fit, since the model 

7 then estimated statistically significant odds ratios for 8 th graders and for all grades using the 

8 1997-2002 period. 

9 Q: What are the most relevant curves that were fit to the data? 

10 A: The curves for the 2001 and 2002 time periods are the most relevant because the 

11 Monitoring the Future survey in 2000 essentially coincided with the “truth” “campaign launch in 

12 February 2000” (Farrelly 2005 p. 426). The estimated curves for the cumulative GRP “dose” 
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Q: What is the best way to visualize the estimate that their quadratic model is making? 

A: Quadratic curves change direction somewhere on the x-axis, as the curves do here. The 

key issue is the shape of the curve within the relevant range of the data. Although the authors 
truncated Figure 2 in the article so that the curves don’t show all of the data, there is a downslope 
in the area of the lowest GRP “doses” on the left side of the figure, and an upslope in the area of 
higher GRP “doses” on the right side. 

Q: Don’t the authors go on to say that the quadratic term shows an attenuation of that 

effect? 

A: Yes, but the odds ratio for the linear “truth” variable equates with a negative coefficient, 
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1 the quadratic with a positive. Neither itself represents the “net effect” of GRP dose, instead, you 

2 must consider them in combination. Here, the quadratic term doesn’t just show attenuation in 

3 the range of relevant data, it shows an upward slope on the right. An attenuation would simply 

4 mean that as exposure increased the effect leveled off. 

5 Q: What do the above estimated curves mean? 

6 A: If you accept their model, they mean that at low exposure levels the amount of smoking 

7 decreased as exposure increased. Then, across the entire middle area, “dose” had no effect. But 

8 after the “inflection point,” which is where the curve changes direction, “dose” was associated 

9 with an increase in smoking. 

10 Q: Why do the curves in Figure 2 stop at 16,000 GRPs? 

11 A: I do not know. As I mentioned before, the measured “doses” went above 22,000 GRPs. 

12 Given the shape of the curves and what we know about the average GRP in the “high” GRP 

13 group, some media markets above 16,000 GRPs are influencing the shape of the curves. In a 

14 message on Dr. Siegel’s web site, Dr. Farrelly said he cut the graph off at 16,000 because that 

15 was the average number of GRPs. I cannot explain why the authors would exclude the data for 

16 markets with GRP exposure above the average. 

17 Q: Have you attempted to extend the curves to show the range for all the data? 

18 A: Yes. I was able to do that with the data in Table 2. 

19 Q: Are your rescaled graphs shown below? 

20 A: Yes, they are, first for all grades combined and then for 8* graders by themselves. 

21 
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2 Q: What do those graphs indicate? 

3 A: According to the results presented in the paper, “truth” was associated with a change in 

4 prevalence only among 8 th graders, but not among 10 th or 12 th graders. So let’s look at the curve 

5 for the 8 th graders. In the media markets with the most exposure, the odds of smoking are higher 

6 than they are among 8 th graders not exposed to the “truth” campaign at all. 

7 Q: What does that mean? 

8 A: That means that, according to their model, kids with the most exposure to “truth” smoke 

9 more than those with no exposure to “truth” compared to the baseline period. 

10 Q: Did the authors discuss that apparent “counterproductive” effect? 

11 A: No. 
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Q: In your opinion, even if we accept them at face value, are these odds ratios large 

enough to be meaningful? 

A: No. As I said before, it is a big challenge for observational studies to make an unbiased 

estimate where the magnitude of the hypothesized association is small. That is because of the 
possibility that known and unknown confounding factors or other biases affected the results. 

Q: Dr. Healton testified that analyzing for the quadratic term was necessary because 

the outcome variable is a “yes-no,” meaning that the question the respondents were asked - 
Have you smoked in the last month? - could be answered either yes or no. She said, “[T]he 
linear fit is not permitted unless the dependent variable is a continuous range of numbers. 

If it’s just yes or no you have to fit to it a quadratic.” (20956:16-18.) 

Is she correct? 

A: No. In fact, what she says makes no sense. 

Q: Would you explain? 

A: Yes. Whether you use a quadratic term has nothing to do with whether the outcome is a 

“yes-no.” As I said, you use a quadratic term when the a linear relationship is insufficient to 
describe the relationship between the independent variable - here, GRP - and the dependent or 
outcome variable here youth smoking. I suspect what she meant was that you can’t fit a linear 
regression for a yes-no variable; you have to use logistic regression. I think she was confusing 
linear vs. quadratic with linear vs. logistic. It’s a little like right vs. wrong and right vs. left. 

Q: Did the authors of Farrelly 2005 in fact test a linear relationship? 

A: Yes, that was the linear test that I mentioned above that was not statistically significant. 

Q: Does that show that the authors recognized that Dr. Healton’s testimony was 

incorrect and that a quadratic term is not required when the outcome is a “yes-no?” 
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Yes, it does. 


1 A: 

2 Failure to Control for Confounders 

3 Q: Did the investigators control for potential confounders? 

4 A: Yes, they did, at least some confounders. 

5 Q: Was the authors’ attempt to control for individual confounders adequate? 

6 A: No. 

7 Q: Why not? 

8 A: There were at least two reasons. First, the differences between the media markets with 

9 the highest and lowest GRPs were too great for them to be controlled for. Second, some 

10 confounders that were not controlled for could easily explain the estimated regression curves. 

11 The Relationship Between Race and GRP “Dose” 

12 Q: Didn’t the authors say they had controlled for differences between GRP groups? 

13 A: The authors do say that they controlled for some background factors that are associated 

14 with smoking, such as race and ethnicity, by including them as additional explanatory or 

15 “control” variables in the regression model. And, in fact, their regression estimates indicate that 

16 race was quite influential in smoking prevalence. But just including variables in a model doesn’t 

17 ensure that you have provided a reliable means of statistical control or an unbiased estimate of 

18 the odds of smoking. 

19 Q: Why not? 

20 A: Because there are limits to how different the various groupings of data can be when you 

21 rely on regression methods to control for differences. Remember, by using regression, you are 

22 trying to replicate the situation in which exposure was randomized, and the people in the various 

23 GRP exposure groups were exactly alike except for their GRP “dose.” Because exposure to GRP 
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1 was not randomized, there is no a priori reason to assume the GRP groups are not substantially 

2 different. The authors relied on their regression method to adjust, or control, for any differences 

3 in the groups. Regression works reasonably well when the difference between groups on the 

4 control variable isn’t too big, but not at all well when the difference is statistically large. 

5 Q: Isn’t the point of including “control'’ variables in a regression equation to control 

6 for the potential influence of known confounding variables? 

7 A: It is, but whether a regression model can do that validly and accurately depends upon 

8 how different the groups are. To simplify things, let’s consider the case with just one control 

9 variable. To slightly over-generalize. when an odds ratio is estimated for GRP exposure, the 

10 model adjusts for the correlation between GRP exposure and the control variable. The validity 

11 of the GRP exposure odds ratio as well as the overall estimate of the odds of smoking depend 

12 upon how well the GRP exposure groups “match” with respect to the control variable. If the 

13 GRP groups are fairly similar with respect to the control variable, then regression can provide 

14 relatively unbiased adjustments. If there is a statistically large difference, a regression model 

15 can’t make an estimate that is actually based on the data, and we know from empirical work that 

16 it produces biased and erratic estimates. Regression can tweak together groups that are close to 

17 each other; it can’t make groups that are very dissimilar the same. 

18 Q: Could you provide a more detailed example to illustrate? 

19 A: Suppose we were attempting to compare smoking behavior in two groups that had been 

20 exposed to different levels of the “truth” campaign, and we had reason to believe the mean ages 

21 of the two groups were very different. We wouldn’t have ended up with that age difference if 

22 “truth” exposure had been randomized, but in this example we are using observational data. To 

23 try to separate the influence of age from GRP exposure on the odds of smoking, we would 
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1 include GRP exposure as our focal predictor variable, with age as a control variable, and 

2 smoking status the outcome variable. What regression needs to “control” for age adequately is 

3 enough “overlap” in the ages of the groups so that the regression model has enough data, in 

4 instances where people are nearly “matched,” to create a reliable statistical basis for controlling. 

5 But if the age characteristics in the GRP exposure groups being compared don’t provide an 

6 adequate match because they differ substantially, then the regression model’s estimate of how 

7 GRP exposure influences the odds of smoking will be biased. 

8 Q: Is age the example you plan to discuss? 

9 A: No, the specific example I have analyzed and plan to discuss is race. I saw from looking 

10 at the map of the media markets in Figure 1 of the Farrelly 2005 ar ticle very strong indications 

11 that the “high” and “low” exposure groups differed in terms of their racial make-up. Race was 

12 included as a control variable in the Farrelly 2005 regressions, so it is only if the racial 

13 characteristics of the “high” and “low” GRP groups provide an adequate statistical match that 

14 regression estimates can provide a reasonably accurate estimate of how GRP “dose” influences 

15 the odds of smoking. But if the between-group race differences are statistically large, regression 

16 simply can’t provide a reasonably accurate answer. Instead, the regression adjustment and 

17 supposed “control” can easily be highly biased, rendering the regression estimates unreliable. 

18 Q: Why does a large difference between the groups “matter” to whether regression can 

19 provide a reliable estimate of the influence of a confounding variable? 

20 A: The problem is related to how the mathematics underlying regression equations work. 

21 When you include control variables in a standard logistic regression equation to compute an odds 

22 ratio for smoking based on GRP exposure, the model estimates how GRP exposure changes the 

23 odds of smoking on the basis of a simultaneous calculation of how the control variable also 
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1 changes the odds of smoking. Regression models aren't smart enough to “see” whether the 

2 values for control variables are closely matched across the groups, or to estimate the influence of 

3 the control variable only on the basis of the values of the control variables in which the matches 

4 are “good.” Instead, regression uses the total number of observations, regardless of how closely 

5 the values match, and the model “thinks” the accuracy of its estimate is based on all of the values 

6 it compares instead of just the values that overlap and provide reasonably good statistical 

7 “matches.” 

8 Q: Is that just a hypothetical issue that really doesn’t apply here? 

9 A: Absolutely not. When the Farrelly 2005 authors stated that they were controlling for the 

10 influence of African American, Hispanic, and Asian race, their regression model estimated how 

11 GRP exposure influenced the odds of smoking on the basis of a simultaneous calculation of how 

12 race impacted the odds of smoking. But the model was “blind” to whether the race variables in 

13 the “high” and “low” exposure groups actually matched each other well. The problem is that 

14 race needs to match fairly well across the “high” and “low” GRP groups in order to create an 

15 unbiased estimate of the odds of smoking. 

16 Q: Is the phenomenon you are describing just statistical theory, or has it been 

17 empirically demonstrated? 

18 A: Technically, it follows from the mathematics of regression equations, but it has also been 

19 empirically demonstrated. Don Rubin, another expert who is testifying in this case, was one of 

20 the statisticians who first demonstrated how to make that type of analysis in a paper he wrote 

21 with Prof. William Cochran, one of my former professors. Professors Cochran and Rubin 

22 compared experimental data to regression methods and provided quantitative guidance on the 

23 magnitude of the resulting bias. 
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1 Q: Did you analyze whether the regression methods the Farrelly 2005 authors used 

2 were able to provide statistical control for race? 

3 A: Yes. I first evaluated whether, when you compared the racial make-up of the “high’' and 

4 “low” GRP exposure groups, there were any obvious causes for concern. I saw that there were. 

5 The map of media markets in Figure 1 of the 2005 article clearly shows that the high GRP areas 

6 were clustered around larger urban areas, while the low GRP areas were much more mral. 

7 Remember, if the “doses” of GRP exposure had been randomized, you wouldn't expect to see 

8 any differences in race based on GRP “dose.” But Figure 1 clearly points to a difference, and 

9 that difference suggests the presence of other associated sociodemographic differences. In fact 

10 the authors acknowledge that “low exposure markets tended to be more rural, White, and less 

11 educated and have lower incomes - all factors associated with smoking - than markets with high 

12 campaign exposure.” So just the map of “high” and “low” GRP dose areas provided a strong 

13 clue that different underlying characteristics in the samples represented by the “high” and “low” 

14 GRP groups might cause a problem in using a regression equation to model the “dose” of GRP 

15 and the “response” of smoking while “controlling” for variables such as race, as is depicted in 

16 Figure 2 of the article. 

17 Q: Did you evaluate if the different racial distributions between the “high” and “low” 

18 GRP dose groups created a problem? 

19 A: Yes, and I confirmed that the “highs” and “lows” had very pronounced differences in the 

20 racial make-up. In fact, those differences are so large they render the entire set of regressions in 

21 the article uninterpretable. The fundamental problem relates back, in part, to something I 

22 discussed earlier in my testimony, as well as something I just mentioned. The authors of the 

23 Farrelly 2005 article did not use a study design that randomized exposure to the “truth” 
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1 campaign. Instead, they used non-randomized observational data for the critical variable of GRP 

2 exposure. Without randomization, there is no scientific basis to assume that the “high” and 

3 “low” GRP groups were racially similar, or were similar in any of the wide variety of 

4 background characteristics that are important predictors of smoking behavior. In fact, as we just 

5 saw, the authors acknowledged the groups differed with respect to several important predictors 

6 of smoking prevalence. So the critical question they should have evaluated and answered, in 

7 terms of the validity of their regression estimates, is: “How different are the groups”? As it turns 

8 out, the groups are just too different for regression to provide an unbiased estimate of the 

9 independent effects of “truth” on the odds of smoking. 

10 Q: Could you please describe the analysis you made? 

11 A: The first question I addressed is which groups would make the most meaningful 

12 comparison. The answer is where there is the strongest indication of GRP dose “effect.” Even 

13 though the Farrelly 2005 authors did not provide any of the actual “points” associated with the 

14 curves they estimate, just by looking at the shape of the curve, the “high” and “low” GRP groups 

15 must dominate the shape of the “dose-response” relationship that the regression equation is 

16 modeling. If we recall, the basic shape of the curves for 2001 and 2002, the curve is essentially 

17 flat in the middle GRP “dose” ranges, which means that the model is estimating essentially no 

18 change in smoking behavior throughout a broad range of GRP “doses,” from about 5.000 GRPs 

19 to around 13,000 GRPs. Only near the two extremes at the “low” and the “high” exposure 

20 groups, does the model predict changes in smoking behavior. Because those two groups are the 

21 most influential in the “dose-response” curve, they are the most appropriate to evaluate. 

22 Q: How did you evaluate the “high” and “low” GRP “dose” groups? 

23 A: I segregated the media markets that belonged to the “high” and “low” GRP groups and 
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1 used data from the 2000 census to determine the racial composition of the various media 

2 markets. There were 20 media markets in the “high” group and 61 in the “low” group. The 

3 results, depicted in the table below, show very large differences in the racial proportions of the 

4 “high” and “low” groups. 



5 

6 Q: Did those results answer the question? 

7 A: Perhaps not completely, but I think almost any experienced biostatistician looking at 

8 those results would recognize that the racial make-up of the two groups differed so greatly that 

9 regression probably wasn't going to provide an unbiased estimate of the odds of smoking based 

10 on GRP dose. 

11 Q: What other analyses did you make? 

12 A: I estimated another logistic regression equation for the “high” and “low” media markets. 

13 In this equation, I used the race predictor variables, and estimated the logarithm of the odds, 

14 called a “logit,” to predict if each particular media market was “high” or “low.” Obviously, I 

15 already knew to which group each media market belonged, but this analysis served two purposes. 

16 First, if GRP “dose” had been randomized, you would expect this equation to have essentially 

17 zero predictive power. In other words, had exposure been randomized, the logit based on race 

18 variables for a media market shouldn't predict whether that media market was in the “high” or 

19 “low” dose group with any degree of accuracy, because exposure would have been random and 

20 not associated with race. But, as it turns out, the race variables alone predict “high” or “low” 

21 GRP status with about 90% accuracy. Second, because the logits have very useful mathematical 

22 properties, I was able to use them to show how different the “high” and “low” groups were with 
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1 respect to the race variables. 

2 Q: What measures of race differences did you calculate? 

3 A: I had calculated logits for each media market within the “high” and “low” groups. 

4 Therefore, I could also calculate the average logit for the “high” and “low” groups, as well their 

5 variances. 

6 Q: Why did you calculate the “high” and “low” group means and variances? 

7 A: Because they show how different the groups really are. They are the statistics that are 

8 used to evaluate whether the racial differences between the “high” and “low” exposure groups 

9 are so large that regression cannot provide valid statistical control. 

10 Q: What were the results? 

11 A: The logit mean in the “low” GRP group was about -2.8, with a variance of about 1.9, and 

12 the logit mean in the “high” group” was about 1.4, with a variance of about 6.9. 

13 Q: Because people who aren’t statisticians generally don’t speak in terms of log-odds or 

14 logits, or even variances, can you put those numbers into a more familiar context? 

15 A: Logits can be transformed back into numbers that are easier for most people to 

16 understand. The model I estimated was the probability based on the racial characteristics that a 

17 market was in the “high” group. The average logit of -2.807 for the “low” GRP group 

18 corresponds to a percentage of about 6, and the average logit of 1.367 for the “high” GRP group 

19 corresponds to a percentage of about 80. That means that the probability that the racial 

20 composition would predict that a “low” media market falls within the high group is about 6 

21 percent and the probability that the model would predict that a “high” market falls into the high 

22 group is about 80%. Remember, we w 7 ant those numbers to be not far from 50-50 for the 

23 estimates to be unbiased. 
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1 Q: Were there any further steps in your analysis? 

2 A: I used those same means and variances to evaluate whether regression could reliably 

3 adjust or control for the bias created by the differences in the racial characteristics of the two 

4 groups. 

5 Q: What did you determine? 

6 A: I found that the differences in race between the “high” and “low” GRP exposure groups 

7 was quite large; in fact it was nearly double the difference that Profs. Cochran and Rubin set as 

8 the outer limit for testing whether regression might be effective in reducing this type of bias. 

9 The mean values were far too dissimilar-, and the shape of the distributions around those means 

10 were also far too dissimilar- for the regression models to produce unbiased estimates. That 

11 extremely large difference extended well beyond the differences that Profs. Cochran and Rubin 

12 proved produced highly erratic and extremely inaccurate regression adjustments. 

13 Q: What is the consequence of those findings on the validity of the results that are 

14 reported in the Farrelly 2005 paper? 

15 A: It means that the results of the regression models, which form the basis upon which the 

16 authors asserted that exposure to the “truth” campaign was associated with reduced youth 

17 smoking prevalence have such large potential bias that they can provide no support for the 

18 claimed association. 

19 Q: Now let’s discuss the confounders that they did not control for; what were they? 

20 A: The same confounders as in the 2002 study, local spending on tobacco control activities 

21 and peer smoking. Also, in this study they did not control for parental smoking. 

22 Q: You recall that we discussed Dr. Healton’s explanation for why they didn’t control 

23 for friends’ smoking. Was that a valid reason? 
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1 A: Her reason is valid in this particular study. Because the “truth” campaign may have 

2 affected friends’ smoking, you would not want to control for it. But that doesn't excuse the 

3 failure to separate out the effects of the “truth'’ campaign from those of friends’ smoking, which, 

4 as I said earlier, has a strong relationship with youth smoking. This is simply another reason 

5 why the authors should have performed a randomized experimental study. 

6 Q: How could they have avoided controlling for friends’ smoking in a randomized 

7 trial? 

8 A: That is the nature of randomization. If the study is large enough, other predictors such as 

9 friends’ smoking would be randomly distributed and therefore would not bias the results. 

10 Q: Is there a reason why they couldn’t have controlled for parental smoking in this 

11 study? 

12 A: No, the consideration that applies to peers doesn't apply to parents because the “truth” 

13 campaign is not directed at parents. The failure to control for parental smoking is a weakness in 

14 the study. 

15 Q: Dr. Healton was asked about the failure to control for parental smoking, and she 

16 said, “The parental smoking, we already know is not associated with GRP, and we discuss 

17 parental smoking at some length in the article, and yes it was de facto controlled for 

18 because it was not related to the GRP dose and [if] it’s not related to the GRP dose it 

19 couldn’t have influenced the study.” (20911:6-10.) Fust of all, was parental smoking 

20 discussed in the article? 

21 A: No. I didn't see a word about parental smoking. 

22 Q: Did the authors control for other variables without any indication of whether they 

23 were correlated with GRP? 
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1 A: Yes, they did. In fact, that is true for all other control variables, such as being male, the 

2 student's weekly income, and parental education. 

3 Q: Dr. Healton also testified that they could not control for parental smoking because 

4 that information was not obtained by Monitoring the Future. Is that a valid reason not to 

5 control for it? (20911:11-16.) 

6 A: I agree that without the information, they could not control for it. But they should have 

7 acknowledged that this is a significant problem, and that it weakened any potential inferences to 

8 be drawn from their study. It is still another reason why they should have done a randomized 

9 study. 

10 Failure to Allow for Multiple Analyses 

11 Q: You mentioned that the results using the quadratic GRP term were significant for 

12 8 th graders but not 10 or 12 th graders. What did the authors say about those disparate 

13 results? 

14 A: They did not discuss their significance. 

15 Q: What is the proper way to evaluate the results of a study that has multiple analyses 

16 like this? 

17 A: The possibility of getting a statistically significant result increases with the number of 

18 independent tests you do, simply by chance. The proper way to handle that is to formulate a 

19 hypothesis that takes that possibility into account. One technique is a “Bonferroni correction.” 

20 Q: Why does the possibility of getting a statistically significant result increase with the 

21 number of independent tests you do? 

22 A: Because statistical analyses are based on principles of probability. An epidemiologic 

23 study is designed to test the hypothesis that a particular result was not due to chance. We use a 
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1 /:>-value of 0.05 to indicate that the outcome would occur by chance only once in 20 independent 

2 trials. But the more statistical tests you calculate, the greater the chance that one of them would 

3 be significant by chance. 

4 Q: How does the Bonferroni correction work? 

5 A: The Bonferroni correction works in one of two ways to make sure that you account for 

6 the greater likelihood of getting a p-value under .05 with multiple tests. You could multiply the 

7 lowest p-value that you obtain by the number of trials and compare it to 0.05 to see if the result 

8 can be attr ibuted to chance. Or you could simply lower the threshold of the value for statistical 

9 significance by dividing it by the number of dials. 

10 Q: How do you apply that principle to the Farrelly (2005) results? 

11 A: First, you need to know their hypotheses. If they hypothesized that the “truth" campaign 

12 would have an effect on 8 th graders but not 10 th or 12 th graders, then the test was consistent with 

13 their hypothesis, and no corrections would be required. The authors did not report their 

14 hypothesis, however, so we cannot conclude that. All they said was that the purpose of the study 

15 was to “assess[] whether there was a dose-response relationship between the level of exposure to 

16 the campaign and youth smoking prevalence during the first 2 years of the campaign.” So at 

17 least some correction for multiple analyses has to be made. 

18 Q: How do you do that? 

19 A: The authors did not report the actual p- value for the results among 8 th graders, only that it 

20 was <0.05, so we can only do an approximation. 

21 Q: Isn’t it standard practice to report the exact p-values? 

22 A: It depends on the journal. I prefer the actual values. They actually did report the p- 

23 values for 10 th and 12 th graders. I cannot explain why they did not for 8* graders. But they did 
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1 report the confidence interval as 0.39 to 0.94, which means that the p-value is somewhat less 

2 than 0.05. Multiplying that times 3 gives us a corrected p-value of somewhat less than 0.15, 

3 which is not statistically significant. For that reason, we cannot rule out the possibility of chance 

4 as an explanation for these findings, even if the study did not have all the other problems I have 

5 mentioned. 

6 Q: Why did you not you include the results for “all grades” in your correction? 

7 A: Because the comparisons were not independent. In fact, they were highly dependent on 

8 the 8 th grade results. 

9 Attribution to “truth” of Youth Smoking Reduction 

10 Q: You mentioned that the authors concluded that “truth” accelerated the decline in 

11 youth smoking and that by 2002 youth smoking rates were 1.5 percentage points lower 

12 than in the absence of the “truth” campaign; do you accept that conclusion? 

13 A: No, I do not. 

14 Q: Why not? 

15 A: Most basically, making that kind of quantitative causal inference presents an 

16 extraordinary challenge. It would essentially require their model to be perfectly specified, but, as 

17 I have explained, it clearly was not. In addition, they improperly extrapolated from youth 

18 smoking rates in 1997-99 to estimate what youth smoking rates would have been during 2000 to 

19 2002 in the absence of “truth.” 

20 Q: Why was it improper to extrapolate from 1997-99 youth smoking rates? 

21 A: For several reasons. According to Monitoring the Future, smoking among 8 th graders has 

22 been declining since 1996. The authors ended their baseline period at 1999 but, if they were 

23 going to have a baseline, they should have extended it to 2000. From 1999 to 2000, smoking 
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1 among 8 th graders declined from 17.5% to 14.6%, none of which should be attributed to “truth.” 

2 Q: Why do you say that, if they used a baseline period, the authors should have 

3 extended it to 2000? 

4 A: The authors state, “[A]s hypothesized, there was no statistically significant relationship 

5 between overall youth smoking prevalence and the campaign after only a few months of the 

6 campaign in 2000 (p. 428.) If they were hypothesizing no effect of the campaign in 2000, 

7 that year should have been part of the baseline. 

8 Q: From 1996-2000 what was the trend in smoking prevalence among 8 th graders? 

9 A: During that period, smoking among 8 th graders declined from 21.0% to 14.6%, a drop of 

10 30% in four years. 

11 Q: What is the significance of that decline? 

12 A: It means that other factors were already contributing to a substantial decline in youth 

13 smoking before “truth” began. 

14 Q: What should the authors have done to try to estimate the amount of smoking 

15 decline, if any, attributable to “truth?” 

16 A: Once again, they should have done a randomized study. It is really the only study 

17 architecture that could feasibly accommodate that sort of methodologic challenge. 

18 Use of 10,000 GRPs as the Basis for Calculation of Odds Ratios 

19 Q: You indicated that the authors used increments of 10,000 GRPs to estimate the ORs; 

20 was that an appropriate unit of analysis? 

21 A: No. It was a scaling that presented an exaggerated effect of the “truth” campaign. 

22 10,000 GRPs is nearly half the entire range of cumulative GRPs in the sample. As you can see 

23 from Figure 2, there was virtually no change in the entire 10,000 GRP range between 5,000 and 
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1 15,000 GRPs in curve for 2002. 

2 Criticisms by Others of Farrelly (2005) 

3 Q: Has this study drawn criticisms from others? 

4 A: Yes, it has. 

5 Q: What are JD-055221 and JD-025210? 

6 A: Those are two electronic letters to the editor regarding this study by Joel M. Moskowitz 

7 to the American Journal of Public Health and the British Medical Journal. 

8 Q: Who is Dr. Moskowitz? 

9 A: According to the website, he is Director of the Center for Family and Community Health 

10 at the University of California at Berkeley. 

11 Q: What did he say about the study? 

12 A: He made two points. First, he pointed out the U-shaped (that is, quadratic) relationship 

13 between “truth" advertising and youth smoking that I discussed above. He stated, “Figure 2 

14 censors the rightmost portion of this relationship by truncating GRP at 15,000 (p. 430).” He also 

15 stated, “The results suggest that the campaign had no detectable effect on smoking prevalence 

16 among those who resided in media markets that received higher levels of exposure which 

17 included students in most major metropolitan areas.” 

18 Q: Do you agree with him? 

19 A: Yes. in part. If I believed the model, I would agree with his statement. But. as I said 

20 before, there are problems with the model. 

21 Q: Dr. Healton testified that Dr. Moskowitz has withdrawn his criticism. She stated: 

22 “I think he’s taken a different position since this was put out, because I don’t think he 

23 understood it fully.” (20957:10-12.) To your knowledge has Dr. Moskowitz withdrawn his 
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1 criticism? 

2 A: I have not seen any indication of it. I just checked the web sites of AJPH and BMJ, 

3 where his criticisms were posted, and they are still there with no indication that he has changed 

4 his mind. Here are the urls: http://bmj.bmjjournals.com/cgi/eletters/330/7491/559 (British 

5 Medical Journal)(JD-025210) and http://www.ajph.Org/cgi/eletters/95/3/425 (American Journal 

6 of Public Health)(JD-055221). 

7 Q: Have there been other criticisms of this study? 

8 A: Yes, there have. 

9 Q: What is JD-025212? 

10 A: That is an exchange between Dr. Farrelly and Michael Siegel, Ph. D.. on Dr. Siegel’s web 

11 site, under the title, “AJPH Paper on ‘truth’ Campaign Suggests No Effect on Youth Smoking.” 

12 Q: Who is Michael Siegel? 

13 A: On his website, he identifies himself as “an Associate Professor in the Social and 

14 Behavioral Sciences Department, Boston University School of Public Health ... [with] 20 years 

15 of experience in tobacco control, primarily as a researcher.” 

16 Q: What did Dr. Siegel say about this study? 

17 A: He said that he had evaluated the study’s findings in light of the alternative interpretation 

18 presented by Dr. Moskowitz and concluded “that they fail to support the conclusion of a 

19 significant effect of the ‘truth’ campaign on youth smoking prevalence.” 

20 Q: What was the basis of his conclusion? 

21 A: One basis was that, without the quadratic term, the results were not statistically 

22 significant. In other words, “the study failed to find any significant relationship between the 

23 intensity of exposure to the ‘truth’ campaign and youth smoking prevalence under the 
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1 assumption that campaign effects would increase linearly with campaign exposure.” (p. 2.) 

2 He also described the effects at the highest level of exposure, pointing out that there was 

3 no effect of the campaign at those levels, and in the case of 8 th graders, actually more smoking at 

4 the highest levels than with no exposure at all. He stated that “the pattern was not consistent 

5 with a hypothesis of diminishing campaign effects, but rather, consistent with the absence of an 

6 effect of the campaign on youth smoking prevalence.” 

7 He concluded: “The important point, I think, is that the results just don’t appear to 

8 support a conclusion that the ‘truth’ campaign resulted in a significant decrease in youth smoking 

9 prevalence. They certainly do not, I believe, support a causal conclusion.” 

10 Q: Earlier you mentioned that Dr. Farrelly responded to Dr. Siegel; was he able to 

11 persuade Dr. Siegel that the study was not flawed? 

12 A: He apparently persuaded Dr. Siegel about the validity of allowing for fixed effects to 

13 control for media-market variables. But Dr. Siegel appeared to remain troubled by the absence 

14 of a dose-response relationship, especially at higher levels of exposure. 

15 AUTHORSHIP 

16 Q: Before we conclude, Dr. Wittes, I would like to ask you a few questions about 

17 information that was disclosed about the authorship of these papers. First, what is 

18 expected regarding reporting of financial support to the authors of a published paper? 

19 A: It is expected that all financial support will be fully disclosed in the publications, 

20 generally in the Acknowledgements section. 

21 Q: I want you to assume that Dr. Healton testified that ALF pays RTI to undertake its 

22 evaluation work (20841:25 to 20842:1) and that it has paid RTI $21 million over the years 

23 to evaluate a broad range of programs (20842:4-8). Was that financial support reported in 
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1 either paper? 

2 A: The 2002 paper did not mention financial support. In the 2005 paper, the authors merely 

3 state that the study was supported by ALF (p.431) but do not state that RTI is ALF’s contractor 

4 for the evaluation of its programs. Ten years ago, this might have been a sufficient disclosure, 

5 but not under today’s standards. 

6 Q: Dr. Healton testified that neither she nor the other authors had access to the 

7 University of Michigan data on which the study was based. (20948:5-10.) Is there a 

8 requirement by scientific journals that the authors of a study have access to the data on 

9 which the study is based? 

10 A: Not in all journals. But some influential ones do. For example, referring again to the 

11 requirements in JAMA, there is a specific requirement about this. It states: “For original data, at 

12 least 1 author (e.g., the principal investigator) should indicate that she or he ‘had full access to all 

13 the data in the study and takes responsibility for the integrity of the data and the accuracy of the 

14 data analysis.’” 

15 Q: What did Farrelly (2005) state about data access? 

16 A: They thanked the principal investigators of Monitoring the Future for “providing timely 

17 access to the MTF data.” (p. 431.) 

18 Q: Is that statement accurate? 

19 A: Not based on Dr. Healton's testimony. 

20 Q: Would your testimony about the requirement that at least one author have access to 

21 the data change if it were the case that the University of Michigan were prevented by law 

22 from providing the Farrelly authors the data? 

23 A: No. At least one of the authors should have access to the data. If none of these authors 
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1 could legally access the data, then someone with access should have been an author. 

2 Q: Would you tell the court your background in issues regarding how to acknowledge 

3 financial support and regarding authorship of published papers? 

4 A: I am current chair of the Policy Committee of the Society for Clinical Trials, which is 

5 currently wrestling with these issues. In fact, the issue is on the agenda for the meeting of this 

6 Committee the week of May 23. 

7 CONCLUSION 

8 Q: In conclusion, Dr. Wittes, do these two studies provide adequate scientific evidence 

9 of a causal connection between the “truth"’ campaign and youth smoking attitudes or 

10 behavior? 

11 A: No. For the reasons I have expressed, they do not. 

12 Q: Thank you very much. Dr. Wittes. 
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